Lucidworks Senior Systems Engineer Allan Syiek discusses simple querying vs. data mining and intelligent search, and how Lucidworks Fusion can help you turn raw data into insight.
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
Lucidworks Senior Search Engineer, Evan Sayer, and Enterprise Content Management and Big Data Architect for the County of Sacramento, Guy Sperry, explore the benefits of replacing Google Search Appliance with Lucidworks Fusion.
The document outlines an agenda for a conference on search and recommenders hosted by Lucidworks, including presentations on use cases for ecommerce, compliance, fraud and customer support; a demo of Lucidworks Fusion which leverages signals from user engagement to power both search and recommendations; and a discussion of future directions including ensemble and click-based recommendation approaches.
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce Lucidworks
This document summarizes a presentation about moving Solr cores to a centralized blob storage system to improve high availability and scalability for Salesforce's search implementation. The current architecture has cores distributed across Solr servers, which limits elasticity. The new design stores core data and metadata in a central blob storage, allowing cores to be loaded and indexed from any Solr server. This improves availability, as cores can quickly be loaded elsewhere if a server fails. Initial testing shows cores can be loaded within seconds on another server using this approach. The system is undergoing further testing and refinement before potential adoption for Salesforce search deployments on public clouds.
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
This document describes Bloomberg's development of a search analytics component for Solr. It was created by their search team to enable complex calculations and aggregations on numerical time-series data. Key features include statistical and mathematical expressions to facet and analyze data, supporting int, long, float, date and string fields. Examples show calculating a weighted average and variance. Future plans include multi-shard support and filtering result sets based on calculated statistics.
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
Lucidworks Senior Search Engineer, Evan Sayer, and Enterprise Content Management and Big Data Architect for the County of Sacramento, Guy Sperry, explore the benefits of replacing Google Search Appliance with Lucidworks Fusion.
The document outlines an agenda for a conference on search and recommenders hosted by Lucidworks, including presentations on use cases for ecommerce, compliance, fraud and customer support; a demo of Lucidworks Fusion which leverages signals from user engagement to power both search and recommendations; and a discussion of future directions including ensemble and click-based recommendation approaches.
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce Lucidworks
This document summarizes a presentation about moving Solr cores to a centralized blob storage system to improve high availability and scalability for Salesforce's search implementation. The current architecture has cores distributed across Solr servers, which limits elasticity. The new design stores core data and metadata in a central blob storage, allowing cores to be loaded and indexed from any Solr server. This improves availability, as cores can quickly be loaded elsewhere if a server fails. Initial testing shows cores can be loaded within seconds on another server using this approach. The system is undergoing further testing and refinement before potential adoption for Salesforce search deployments on public clouds.
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
This document discusses how SQL can be used in Lucidworks Fusion for various purposes like aggregating signals to compute relevance scores, ingesting and transforming data from various sources using Spark SQL, enabling self-service analytics through tools like Tableau and PowerBI, and running experiments to compare variants. It provides examples of using SQL for tasks like sessionization with window functions, joining multiple data sources, hiding complex logic in user-defined functions, and powering recommendations. The document recommends SQL in Fusion for tasks like analytics, data ingestion, machine learning, and experimentation.
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
This document summarizes S&P Global's use of Solr for search capabilities across their large datasets. It discusses how S&P Global indexes over 50 million documents into Solr monthly and handles over 5 million queries per week. It outlines challenges faced with an on-premise Solr deployment and how migrating to Solr Cloud helped address issues like performance, availability, and scalability. Next steps discussed include improving relevancy through data science, continuing to leverage new Solr features, and exploring ways to integrate machine learning into search capabilities.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
This document describes Bloomberg's development of a search analytics component for Solr. It was created by their search team to enable complex calculations and aggregations on numerical time-series data. Key features include statistical and mathematical expressions to facet and analyze data, supporting int, long, float, date and string fields. Examples show calculating a weighted average and variance. Future plans include multi-shard support and filtering result sets based on calculated statistics.
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
This document provides an agenda and overview for a conference session on Solr 6 and its new capabilities for parallel SQL and graph queries. The session will cover motivations for adding these features to Solr, how streaming expressions enable parallel SQL, graph capabilities through the new graph query parser and streaming expressions, and comparisons to other technologies. The document includes examples of SQL queries and graph streaming expressions in Solr.
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
The document discusses the challenges of building a news search engine at Bloomberg L.P. It describes how Bloomberg uses Apache Solr/Lucene to index millions of news stories and handle complex search queries from customers. Some key challenges discussed include optimizing searches over huge numbers of documents and metadata fields, handling arbitrarily complex queries, and developing an alerting system to notify users of new matching results. The system has been scaled up to include thousands of Solr cores distributed across data centers to efficiently search and retrieve news content.
Uber has created a Data Science Workbench to improve the productivity of its data scientists by providing scalable tools, customization, and support. The Workbench provides Jupyter notebooks for interactive coding and visualization, RStudio for rapid prototyping, and Apache Spark for distributed processing. It aims to centralize infrastructure provisioning, leverage Uber's distributed backend, enable knowledge sharing and search, and integrate with Uber's data ecosystem tools. The Workbench manages Docker containers of tools like Jupyter and RStudio running on a Mesos cluster, with files stored in a shared file system. It addresses the problems of wasted time from separate infrastructures and lack of tool standardization across Uber's data science teams.
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
An informational, or statistical, constraint is a constraint such as a unique, primary key, foreign key, or check constraint that can be used by Apache Spark to improve query performance. Informational constraints are not enforced by the Spark SQL engine; rather, they are used by Catalyst to optimize the query processing. Informational constraints will be primarily targeted to applications that load and analyze data that originated from a data warehouse. For such applications, the conditions for a given constraint are known to be true, so the constraint does not need to be enforced during data load operations.
This session will cover the support for primary and foreign key (referential integrity) constraints in Spark. You’ll learn about the constraint specification, metastore storage, constraint validation and maintenance. You’ll also see examples of query optimizations that utilize referential integrity constraints, such as Join and Distinct elimination and Star Schema detection.
This document summarizes how Solr and Lucidworks Fusion can be used for big data search and analytics. It discusses indexing strategies like using MapReduce, Spark, and Fusion connectors to index structured and unstructured data from HDFS. It also covers topics like Solr on HDFS, auto add replicas, security, cluster sizing, and using the lambda architecture with Spark streaming to enable real-time search over batch-processed historical data. The document promotes Lucidworks Fusion as a search platform that can handle massive scales of data, provide real-time search capabilities, and work with any data source securely.
Managed Search: Presented by Jacob Graves, Getty ImagesLucidworks
Getty Images uses a managed search system to allow business users to control image search results. The system breaks search scoring into relevancy, recency, and image source components. It provides interfaces to adjust component weights and visualize the effects. Test algorithms can be run on a percentage of users before being promoted to the main search. The system is built on SOLR and uses custom plugins and functions to implement complex scoring and result shuffling while providing business users simple controls.
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
This document summarizes Target's implementation of Solr as its search platform. It discusses how Target transitioned from Oracle-Endeca to Solr to handle its large scale data and enable more flexible relevancy controls. It describes how Target tested Solr through handling live guest traffic in two sprints and moving its typeahead functionality to the public cloud. Finally, it outlines how Target leverages key Solr capabilities like collection aliases, atomic updates, and configurable facets to synchronize designer and product launches.
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
This document discusses implementing learning to rank (LTR) using a product catalog dataset from Best Buy with click-through data. It describes:
1. Using Solr LTR and Fusion together, with Solr extracting content features and Fusion enabling complex feature engineering.
2. Training an XGBoost pairwise ranking model on features like TF-IDF and click signals, which outperforms baselines.
3. The trained model yields up to a 13 percentage point increase in NDCG over hand-tuned scoring, working best on popular queries.
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
This talk is a case-study on how Apache Spark and the Spark-Solr library is being used at Flipp for driving search relevancy. Flipp is a Toronto based digital flyer and ecommerce company which helps shoppers save money on weekly shopping. Our customers have the option of browsing through our 5+ million products from the brick-and-mortar retailers in North America. This makes Search a very challenging function in our app. How to show the most relevant and personalized search results to users on a query?
The talk will focus on using user signals such as Click Through Rate (CTR) and Impressions to increase search relevancy. I will also talk about how PySpark is used to create the Flipp Search ETL platform for collecting user signals and reading product data from Solr. The problem scenario will be explained in which keyword search and basic relevancy algorithms become ineffective when dealing with a large product database. The solutions will cover the following implementations being used at Flipp to drive relevancy: – Utilizing user clicks and popularity data to derive and index normalized item weights to implement the Search Crowd Curation models in Apache Solr
– How around 5+ million items are classified into Google Categories in real time using Keras and Apache Spark to power product category curation in Solr.
– How to create a crowd sourced query intent categorizer in Solr using the Spark-Solr library.
– The use of offline and online metrics at Flipp for evaluating changes in search relevancy.
– Future plans for incorporating Kafka-connect in Apache Solr with structured streaming to perform real-time product indexing with Spark-Solr library.
Search was once considered a black-box application that ingested content and delivered results to users opaquely. However, driven by the opportunities and demands of the growing universe of content and by the versatility of Solr/Lucene open source search technology, search applications are evolving from a standalone facility to an enabling framework.http://www.lucidimagination.com/developer/whitepapers/search-readiness-checklist
This presentation examines the main building blocks for building a big data pipeline in the enterprise. The content uses inspiration from some of the top big data pipelines in the world like the ones built by Netflix, Linkedin, Spotify or Goldman Sachs
JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky
This presentation will be useful to those who would like to get acquainted with Apache Spark architecture, top features and see some of them in action, e.g. RDD transformations and actions, Spark SQL, etc. Also it covers real life use cases related to one of ours commercial projects and recall roadmap how we’ve integrated Apache Spark into it.
Was presented on JEEConf 2015 in Kyiv.
Design by Yarko Filevych: http://www.filevych.com/
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
The document describes how Solr can be used for real-time analytics on large datasets. It discusses how Solr's inverted index, columnar storage, and multi-segment indexing enable fast search and analytics. Faceted search is described as a way to break results into buckets to filter and explore the data. The new Solr facet module aims to improve integration, performance, and ease of use for advanced analytics through faceting.
This document discusses how Lucene/Solr is used for search applications across different industries. It begins by outlining key considerations for understanding search opportunities and requirements, such as the types of data being searched, the users needing search results and why, integration with IT infrastructure, and the user interface. It then provides examples of how Lucene/Solr powers search applications in industries like yellow pages and local search, media, e-commerce, jobs and career sites, libraries and museums, social media, and enterprise intranet search. The document aims to demonstrate Lucene/Solr's versatility and flexibility in meeting the diverse search needs of real-world organizations.
Apache Arrow Flight: A New Gold Standard for Data TransportWes McKinney
This document discusses how structured data is often moved inefficiently between systems, causing waste. It introduces Apache Arrow, an open standard for in-memory data, and how Arrow can help make data movement more efficient. Systems like Snowflake and BigQuery are now using Arrow to help speed up query result fetching by enabling zero-copy data transfers and sharing file formats between query processing and storage.
Solr is a great tool to have in the data scientist toolbox. In this talk, I walk through several demos of using Solr to data science activities as well as explore various use cases for Solr and data science
Anshum Gupta is an Apache Lucene/Solr committer who works at Lucidworks. He discusses the history and capabilities of Apache Lucene, an open source information retrieval library, and Apache Solr, an enterprise search platform built on Lucene. Solr has over 8 million downloads and is used by many large companies for search capabilities including indexing, faceting, auto-complete, and scalability to handle large datasets. Major updates in Solr 5 include improved performance, security features, and analytics capabilities.
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
The document discusses developing a scalable user search feature for the PlayStation 4. It describes setting up a SolrCloud cluster with 300 million user documents distributed across 4 shards. Personalized search ranks results based on friendship connections by using a Lucene index to store close connections for each user. Challenges included instability in the initial Solr 4.8 cluster which was addressed through configuration changes. An upgrade to Solr 5.4 required fully reindexing the data due to schema changes.
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
This document provides an agenda and overview for a conference session on Solr 6 and its new capabilities for parallel SQL and graph queries. The session will cover motivations for adding these features to Solr, how streaming expressions enable parallel SQL, graph capabilities through the new graph query parser and streaming expressions, and comparisons to other technologies. The document includes examples of SQL queries and graph streaming expressions in Solr.
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
The document discusses the challenges of building a news search engine at Bloomberg L.P. It describes how Bloomberg uses Apache Solr/Lucene to index millions of news stories and handle complex search queries from customers. Some key challenges discussed include optimizing searches over huge numbers of documents and metadata fields, handling arbitrarily complex queries, and developing an alerting system to notify users of new matching results. The system has been scaled up to include thousands of Solr cores distributed across data centers to efficiently search and retrieve news content.
Uber has created a Data Science Workbench to improve the productivity of its data scientists by providing scalable tools, customization, and support. The Workbench provides Jupyter notebooks for interactive coding and visualization, RStudio for rapid prototyping, and Apache Spark for distributed processing. It aims to centralize infrastructure provisioning, leverage Uber's distributed backend, enable knowledge sharing and search, and integrate with Uber's data ecosystem tools. The Workbench manages Docker containers of tools like Jupyter and RStudio running on a Mesos cluster, with files stored in a shared file system. It addresses the problems of wasted time from separate infrastructures and lack of tool standardization across Uber's data science teams.
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
An informational, or statistical, constraint is a constraint such as a unique, primary key, foreign key, or check constraint that can be used by Apache Spark to improve query performance. Informational constraints are not enforced by the Spark SQL engine; rather, they are used by Catalyst to optimize the query processing. Informational constraints will be primarily targeted to applications that load and analyze data that originated from a data warehouse. For such applications, the conditions for a given constraint are known to be true, so the constraint does not need to be enforced during data load operations.
This session will cover the support for primary and foreign key (referential integrity) constraints in Spark. You’ll learn about the constraint specification, metastore storage, constraint validation and maintenance. You’ll also see examples of query optimizations that utilize referential integrity constraints, such as Join and Distinct elimination and Star Schema detection.
This document summarizes how Solr and Lucidworks Fusion can be used for big data search and analytics. It discusses indexing strategies like using MapReduce, Spark, and Fusion connectors to index structured and unstructured data from HDFS. It also covers topics like Solr on HDFS, auto add replicas, security, cluster sizing, and using the lambda architecture with Spark streaming to enable real-time search over batch-processed historical data. The document promotes Lucidworks Fusion as a search platform that can handle massive scales of data, provide real-time search capabilities, and work with any data source securely.
Managed Search: Presented by Jacob Graves, Getty ImagesLucidworks
Getty Images uses a managed search system to allow business users to control image search results. The system breaks search scoring into relevancy, recency, and image source components. It provides interfaces to adjust component weights and visualize the effects. Test algorithms can be run on a percentage of users before being promoted to the main search. The system is built on SOLR and uses custom plugins and functions to implement complex scoring and result shuffling while providing business users simple controls.
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...Lucidworks
1) SAS migrated their enterprise search from Google and another solution to Lucidworks Fusion to have a single search platform.
2) They encountered issues with the out of box configuration and content that impacted search relevance and ranking.
3) Through multiple iterations of configuration changes, indexing adjustments, and using AI/SAS tools to evaluate search terms, they improved relevance and the types of results returned for key search terms.
4) Future plans for the search include immediate indexing of new content, auto-suggest, spellcheck, and integrating search data into analytics dashboards. Lucidworks was chosen for its data analytics abilities, easy administration, and connectors.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
This document summarizes Target's implementation of Solr as its search platform. It discusses how Target transitioned from Oracle-Endeca to Solr to handle its large scale data and enable more flexible relevancy controls. It describes how Target tested Solr through handling live guest traffic in two sprints and moving its typeahead functionality to the public cloud. Finally, it outlines how Target leverages key Solr capabilities like collection aliases, atomic updates, and configurable facets to synchronize designer and product launches.
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
This document discusses implementing learning to rank (LTR) using a product catalog dataset from Best Buy with click-through data. It describes:
1. Using Solr LTR and Fusion together, with Solr extracting content features and Fusion enabling complex feature engineering.
2. Training an XGBoost pairwise ranking model on features like TF-IDF and click signals, which outperforms baselines.
3. The trained model yields up to a 13 percentage point increase in NDCG over hand-tuned scoring, working best on popular queries.
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
This talk is a case-study on how Apache Spark and the Spark-Solr library is being used at Flipp for driving search relevancy. Flipp is a Toronto based digital flyer and ecommerce company which helps shoppers save money on weekly shopping. Our customers have the option of browsing through our 5+ million products from the brick-and-mortar retailers in North America. This makes Search a very challenging function in our app. How to show the most relevant and personalized search results to users on a query?
The talk will focus on using user signals such as Click Through Rate (CTR) and Impressions to increase search relevancy. I will also talk about how PySpark is used to create the Flipp Search ETL platform for collecting user signals and reading product data from Solr. The problem scenario will be explained in which keyword search and basic relevancy algorithms become ineffective when dealing with a large product database. The solutions will cover the following implementations being used at Flipp to drive relevancy: – Utilizing user clicks and popularity data to derive and index normalized item weights to implement the Search Crowd Curation models in Apache Solr
– How around 5+ million items are classified into Google Categories in real time using Keras and Apache Spark to power product category curation in Solr.
– How to create a crowd sourced query intent categorizer in Solr using the Spark-Solr library.
– The use of offline and online metrics at Flipp for evaluating changes in search relevancy.
– Future plans for incorporating Kafka-connect in Apache Solr with structured streaming to perform real-time product indexing with Spark-Solr library.
Search was once considered a black-box application that ingested content and delivered results to users opaquely. However, driven by the opportunities and demands of the growing universe of content and by the versatility of Solr/Lucene open source search technology, search applications are evolving from a standalone facility to an enabling framework.http://www.lucidimagination.com/developer/whitepapers/search-readiness-checklist
This presentation examines the main building blocks for building a big data pipeline in the enterprise. The content uses inspiration from some of the top big data pipelines in the world like the ones built by Netflix, Linkedin, Spotify or Goldman Sachs
JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky
This presentation will be useful to those who would like to get acquainted with Apache Spark architecture, top features and see some of them in action, e.g. RDD transformations and actions, Spark SQL, etc. Also it covers real life use cases related to one of ours commercial projects and recall roadmap how we’ve integrated Apache Spark into it.
Was presented on JEEConf 2015 in Kyiv.
Design by Yarko Filevych: http://www.filevych.com/
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
The document describes how Solr can be used for real-time analytics on large datasets. It discusses how Solr's inverted index, columnar storage, and multi-segment indexing enable fast search and analytics. Faceted search is described as a way to break results into buckets to filter and explore the data. The new Solr facet module aims to improve integration, performance, and ease of use for advanced analytics through faceting.
This document discusses how Lucene/Solr is used for search applications across different industries. It begins by outlining key considerations for understanding search opportunities and requirements, such as the types of data being searched, the users needing search results and why, integration with IT infrastructure, and the user interface. It then provides examples of how Lucene/Solr powers search applications in industries like yellow pages and local search, media, e-commerce, jobs and career sites, libraries and museums, social media, and enterprise intranet search. The document aims to demonstrate Lucene/Solr's versatility and flexibility in meeting the diverse search needs of real-world organizations.
Apache Arrow Flight: A New Gold Standard for Data TransportWes McKinney
This document discusses how structured data is often moved inefficiently between systems, causing waste. It introduces Apache Arrow, an open standard for in-memory data, and how Arrow can help make data movement more efficient. Systems like Snowflake and BigQuery are now using Arrow to help speed up query result fetching by enabling zero-copy data transfers and sharing file formats between query processing and storage.
Solr is a great tool to have in the data scientist toolbox. In this talk, I walk through several demos of using Solr to data science activities as well as explore various use cases for Solr and data science
Anshum Gupta is an Apache Lucene/Solr committer who works at Lucidworks. He discusses the history and capabilities of Apache Lucene, an open source information retrieval library, and Apache Solr, an enterprise search platform built on Lucene. Solr has over 8 million downloads and is used by many large companies for search capabilities including indexing, faceting, auto-complete, and scalability to handle large datasets. Major updates in Solr 5 include improved performance, security features, and analytics capabilities.
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
The document discusses developing a scalable user search feature for the PlayStation 4. It describes setting up a SolrCloud cluster with 300 million user documents distributed across 4 shards. Personalized search ranks results based on friendship connections by using a Lucene index to store close connections for each user. Challenges included instability in the initial Solr 4.8 cluster which was addressed through configuration changes. An upgrade to Solr 5.4 required fully reindexing the data due to schema changes.
Anshum Gupta presented on the Apache Solr security framework. He began with an introduction of himself and overview of Apache Lucene and Solr. The presentation then covered the need for security in Solr, available security options which include SSL, ZooKeeper ACLs, and authentication and authorization frameworks. Gupta discussed the authentication and authorization plugin architectures, available plugins like BasicAuth and Kerberos, and benefits of the security frameworks like enabling multi-tenant and access controlled features. He concluded with recommendations on writing custom plugins and next steps to improve Solr security.
Anshum Gupta is an Apache Lucene/Solr committer and Lucidworks employee with over 9 years of experience in search and related technologies. He has been involved with Apache Lucene since 2006 and Apache Solr since 2010, focusing on contributions, releases, and communities around Solr. The document then provides an overview of the major new features and improvements in Apache Solr 4.10, including ease of use enhancements, distributed pivot faceting, core, SolrCloud, and development tool updates.
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...Lucidworks
This document discusses using Jenkins to create a continuous delivery pipeline for Apache Solr. It describes packaging and deploying Solr configurations through the pipeline. Key steps include building Solr packages, deploying Solr to stage environments, and deploying Solr configurations from version control. The pipeline allows predictable, routine deployments and reduces work-in-progress through automation.
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingLucidworks
Solr JDBC allows users to query indexed data in Apache Solr using standard SQL. It provides a JDBC driver and integrates with existing JDBC tools, allowing SQL skills to be leveraged with Solr. The presenter demonstrated Solr JDBC with various programming languages and tools like Java, Python, R, Apache Zeppelin, RStudio, DbVisualizer and SQuirreL SQL. Future improvements may include replacing Presto with Calcite for SQL processing and enhancing compatibility. Joining data from multiple Solr collections was also discussed.
Scaling SolrCloud to a large number of CollectionsAnshum Gupta
Anshum Gupta presented on scaling SolrCloud to support thousands of collections. Some challenges included limitations on the cluster state size, overseer performance issues under high load, and difficulties moving or exporting large amounts of data. Solutions involved splitting the cluster state, improving overseer performance through optimizations and dedicated nodes, enabling finer-grained shard splitting and data migration between collections, and implementing distributed deep paging for large result sets. Testing was performed on an AWS infrastructure to validate scaling to billions of documents and thousands of queries/updates per second. Ongoing work continues to optimize and benchmark SolrCloud performance at large scales.
This document summarizes a talk on search given at Search Camp United Nations in NYC on July 10, 2016. The talk will showcase and detail examples of different types of search including rules, typeahead/suggest, signals, and location awareness, and how they can be brought together into a cohesive search experience. It provides information on the speaker, Erik Hatcher, and covers various anatomy of search results and features like relevancy ranking, faceting, highlighting, grouping, spellchecking, autocomplete and more.
Talk given at airbnb HQ in San Francisco on July 8th, 2015 at the Downtown SF Apache Lucene/Solr meetup.
This talk covers an overview of both, the authentication and authorization frameworks in Apache Solr, and how they work together. It also provides an overview of existing plugins and how to enable them to restrict user access to resources within Solr.
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Lucidworks
Iron Mountain uses cross data center replication (CDCR) to replicate its Solr indexes across two data centers for disaster recovery purposes. CDCR allows Iron Mountain to maintain a warm backup of its 5.3 billion document Solr index that can be restored within an hour in the event of an outage or corrupted index. Iron Mountain has successfully used its CDCR setup on two occasions to restore its production index when issues arose. The presentation will discuss how Iron Mountain configured and maintains its CDCR system and the advantages it provides over other disaster recovery options.
This document discusses SolrCloud cluster management APIs. It provides a brief history of SolrCloud and how cluster management has evolved since its introduction in Solr 4.0 when there were no APIs for managing distributed clusters. It outlines several key SolrCloud cluster management APIs for creating and managing collections, replica placement strategies, scaling up clusters, moving data between shards and nodes, monitoring cluster status, managing leader elections, and migrating cluster infrastructure. It envisions rule-based automation for tasks like monitoring disk usage and automatically adding/removing replicas based on cluster status.
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...Lucidworks
1) The document describes a case study using Apache Solr for image analysis as part of a "images as big data" application prototype. Solr provides data storage and search capabilities for the Image as Big Data Toolkit.
2) Various types of data visualization are discussed, including traditional statistical charts, tabular displays, notebook-based visualization, and map-based displays. Crime data and microscope image analysis are used as examples.
3) Solr integrates well into the data pipeline due to its flexibility and ability to work with other components like Apache Tika. Deep learning and machine learning can also be incorporated to develop analytics applications with intelligent search.
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...Lucidworks
This document summarizes Sony's development of a scalable search system using Apache Solr for user-generated content on the PlayStation platform. PlayStation users can easily share media like broadcasts, screenshots, and videos to third-party networks. Sony built a SolrCloud-based system to provide a central place to search for this content across millions of users. The system uses three Solr clusters to handle different media types and supports over 20 languages. It processes over 1 billion search requests per day with low latency.
Managing a SolrCloud cluster using APIsAnshum Gupta
The document discusses managing large SolrCloud clusters through APIs. It begins with background on SolrCloud and its terminology. It then demonstrates various APIs for creating and modifying collections, adding/deleting replicas, splitting shards, and monitoring cluster status. It provides recipes for common management tasks like shard splitting, ensuring high availability, and migrating infrastructure. Finally, it mentions upcoming backup/restore capabilities and encourages connecting on social media.
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxLucidworks
I apologize, upon reviewing the document I do not feel comfortable generating a summary due to the complex technical nature of the content and queries.
Imagine the frustration of the user, when they found their perfect wish while browsing, only to realize it later (when they clicked it) that it was out of stock or the price switched or it was not delivered at their location. This happens when the search index doesn’t have the real-time availability, price and seller information. Hence it is a core challenge that an E-Commerce marketplace search engine has to solve. Regular document search index technologies (like Solr/Lucene) have trouble dealing with attributes which are in high constant flux (like availability, price) which are typically seller/listing specific attributes. In this talk, we present the challenges and our solutions for a customized search index for e-commerce addressing these challenges.
This document discusses deploying and managing Apache Solr at scale. It introduces the Solr Scale Toolkit, an open source tool for deploying and managing SolrCloud clusters in cloud environments like AWS. The toolkit uses Python tools like Fabric to provision machines, deploy ZooKeeper ensembles, configure and start SolrCloud clusters. It also supports benchmark testing and system monitoring. The document demonstrates using the toolkit and discusses lessons learned around indexing and query performance at scale.
The document discusses how traditional analytics approaches are no longer sufficient due to new data sources like machine data that are unstructured and from external sources. It introduces Splunk as a platform that can collect, index, and analyze massive amounts of machine data in real-time to provide operational intelligence and business insights. Splunk uses late binding schema to allow ad-hoc queries over heterogeneous machine data without needing to design schemas upfront. It can complement traditional BI tools by focusing on real-time analytics over machine data while traditional tools focus on structured data.
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
With Oracle Data Visualization Cloud Service, your business users can perform self-service analytics, spot patterns, trends, correlations, and construct visual data stories for greater insight into how your product, service, or organization is performing.
In this webinar, we demonstrated how easily users can explore their data in new and different ways through stunning visualizations automatically, promoting self-service discovery.
Discussion included:
-In-depth review of Oracle Data Visualization Cloud Service
-Connecting different data sets like HCM, ERP, Sales Cloud and more
-Mobile and security
-Demo taking a real-world business use case from end to end
This document discusses how business analytics is shifting from relying solely on structured data to leveraging new unstructured data sources like machine data. Traditional analytics approaches involve rigid schemas and long design cycles, while Splunk allows indexing and searching of heterogeneous machine data in real-time without schemas. Splunk delivers insights across IT, security, and business by integrating machine data with structured context data to provide insights like customer analytics, product analytics, and digital intelligence.
This document summarizes Info Explorer, a business intelligence tool developed by Orchid Systems to provide analytical reporting and insights using data from Sage 300 ERP. Info Explorer allows users to slice, dice, drill down and summarize Sage 300 data in real-time dashboards and cubes. It provides a more cost-effective alternative to Crystal Reports. A free lite version, Info Explorer Lite, allows users to analyze sample cubes connected to their live Sage 300 database for 30 days before requiring an activation code.
Splunk is a time-series data platform that handles the three V's of data (volume, velocity, and variety) very well. It collects, indexes, and allows searching and analysis of data. Splunk can collect data from files, directories, network ports, programs/scripts, and databases. It breaks data down into searchable events and builds a high-performance index. This allows users to search, manipulate, and visualize data in reports, charts, and dashboards. Splunk can analyze structured, unstructured, and multistructured data from various sources like logs, networks, clicks, and more.
Assessing New Databases– Translytical Use CasesDATAVERSITY
Organizations run their day-in-and-day-out businesses with transactional applications and databases. On the other hand, organizations glean insights and make critical decisions using analytical databases and business intelligence tools.
The transactional workloads are relegated to database engines designed and tuned for transactional high throughput. Meanwhile, the big data generated by all the transactions require analytics platforms to load, store, and analyze volumes of data at high speed, providing timely insights to businesses.
Thus, in conventional information architectures, this requires two different database architectures and platforms: online transactional processing (OLTP) platforms to handle transactional workloads and online analytical processing (OLAP) engines to perform analytics and reporting.
Today, a particular focus and interest of operational analytics includes streaming data ingest and analysis in real time. Some refer to operational analytics as hybrid transaction/analytical processing (HTAP), translytical, or hybrid operational analytic processing (HOAP). We’ll address if this model is a way to create efficiencies in our environments.
Discover how to boost your reporting and navigate Sage 300 faster and easier in this presentation. You can watch the full recording here: http://bit.ly/2qf1awF
Business Intelligence (BI) involves transforming raw transactional data into meaningful information for analysis using techniques like OLAP. OLAP allows for multidimensional analysis of data through features like drill-down, slicing, dicing, and pivoting. It provides a comprehensive view of the business using concepts like dimensional modeling. The core of many BI systems is an OLAP engine and multidimensional storage that enables flexible and ad-hoc querying of consolidated data for planning, problem solving and decision making.
Business Intelligence Trends for University of Western AustraliaJoshua Fletcher
This document summarizes a guest lecture given by Joshua Fletcher on SAP Business Intelligence. The agenda included introductions, an overview of SAP, their analytics and information management platform, and industry trends like mobile analytics, in-memory databases, and predictive analysis. Fletcher has 12 years of experience with BI tools and is now a BI architect. He discussed SAP's analytics platform and trends in mobile BI, in-memory databases like SAP HANA, and predictive analytics using SAP Predictive Analysis. He demonstrated the capabilities of SAP's mobile BI, HANA, and predictive analysis software and provided useful links for further information.
1) In-memory computing is growing rapidly, with the total data market expected to grow from $69 billion in 2015 to $132 billion in 2020.
2) In-memory databases are gaining popularity for applications that require fast response times, like telecommunications and mobile advertising, as memory access is faster than disk access.
3) Modern applications are driving adoption of in-memory solutions as they generate more data from more users and transactions and require faster performance to handle growing traffic.
4) Two examples presented were DellEMC using MemSQL for a real-time customer 360 application and an IoT logistics application called MemEx that processes sensor data from warehouses for predictive analytics.
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
The Briefing Room with David Loshin and Datawatch
Live Webcast Feb. 17, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=4a053043c45cf0c2f6453dfb8577c72a
Patience may be a virtue, but when it comes to streaming analytics, waiting is no option. Between Big Data and the Internet of Things, businesses are faced with more data and greater complexity than ever before. Traditional information architectures simply cannot support the kind of processing necessary to make use of this fast-moving resource. The modern context requires a shorter path to analytics, one that narrows the gap between governance and discovery
Register for this episode of The Briefing Room to hear veteran Analyst David Loshin as he explains how the prevalence of streaming data is changing business pace and processes. He’ll be briefed by Dan Potter of Datawatch, who will tout his company’s real-time data discovery platform for data in motion. He will show how self-service data preparation can lead to faster insights, ultimately fostering the ability to make precise decisions at the right time.
Visit InsideAnalysis.com for more information.
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
Webinar: Increase Conversion With Better SearchLucidworks
This document discusses a partnership between IBM Commerce and Lucidworks to improve e-commerce search experiences. Key points:
1. The partnership will integrate Lucidworks' Apache Solr-based search platform Fusion with IBM Commerce to power search and recommendations on IBM Commerce sites.
2. Fusion will enrich product content, queries, and results to improve findability. It will also use signals from user interactions for more relevant results and personalized recommendations.
3. The integration aims to improve customer experiences and conversions by ensuring customers can find products through various query types and discover related items to buy.
This document discusses drivers of AI and digital transformation and the role of the Chief AI Officer (C.AI.O). It notes that by 2022, 75% of globally shipped working assets will have event-driven decision support systems. It asks what is needed to win in the next wave of value from AI and digital transformation, how to prepare data science teams to win, and how to build a value-driven AI playbook. It suggests the C.AI.O is the corporate role best able to find answers to these questions, as the C.AI.O can drive change, value, and competition through data, legal, and policy exploration to maximize appropriate AI use.
Take Action: The New Reality of Data-Driven BusinessInside Analysis
The Briefing Room with Dr. Robin Bloor and WebAction
Live Webcast on July 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=360d371d3a49ad256942f55350aa0a8b
The waiting used to be the hardest part, but not anymore. Today’s cutting-edge enterprises can seize opportunities faster than ever, thanks to an array of technologies that enable real-time responsiveness across the spectrum of business processes. Early adopters are solving critical business challenges by enabling the rapid-fire design, development and production of very specific applications. Functionality can range from improved customer engagement to dynamic machine-to-machine interactions.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, who will tout a new era in data-driven organizations, and why a data flow architecture will soon be critical for industry leaders. He’ll be briefed by Sami Akbay of WebAction, who will showcase his company’s real-time data management platform, which combines all the component parts needed to access, process and leverage data big and small. He’ll explain how this new approach can provide game-changing power to organizations of all types and sizes.
Visit InsideAnlaysis.com for more information.
The Great Lakes: How to Approach a Big Data ImplementationInside Analysis
- Rick Stellwagen from Think Big, A Teradata Company, discussed best practices for implementing a data lake including establishing standards for data ingestion and metadata capture, developing a security plan, and planning for data discovery and reporting.
- Analyst Robin Bloor asked questions about metadata management, data governance, and security for data lakes. Bloor noted that while data lakes are a new concept, best practices are needed as organizations move analytics and BI capabilities to this model.
- Upcoming Briefing Room topics in 2015 will focus on big data, cloud computing, and innovators in technology.
The document discusses real-time data and the PI System for managing it. The PI System collects real-time data from various sources, historizes large volumes of data reliably over long periods, and allows users to find, analyze, deliver, and visualize the data. It provides a comprehensive view of operational information through intuitive visuals to help users make informed decisions.
Splunk in the Cisco Unified Computing System (UCS) Splunk
Cisco has been a Splunk customer for 8 years, with a strong engineering partnership for 3+ years. Learn how several Cisco customers as well as Cisco IT have deployed, grown, and transformed our businesses using the advantages of Splunk Enterprise software together with Cisco UCS and Nexus hardware. We will also talk about scalability and performance considerations for all scales of data footprint and business growth.
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
This document discusses next generation big data business intelligence (BI). It describes traditional BI and how it is evolving to incorporate big data. Key points:
- Traditional BI includes dashboards, KPIs, OLAP, reporting, and forecasting to provide insights from structured data.
- Next generation BI leverages big data technologies like Hadoop and NoSQL databases to handle larger and more diverse unstructured data in batch and real-time.
- This enables deeper insights through analytics across all data, from basic queries to advanced predictive modeling and streaming analysis.
- The modern BI stack incorporates big data technologies alongside traditional data warehousing and OLAP for integrated insights.
Similar to Webinar: Fusion for Business Intelligence (20)
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
Lunch and Learn during Retail TouchPoints #RIC21 virtual event.
***
Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth.
Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to:
-Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions
-Enhance the experience for both shoppers and merchandisers
-Explore signals to transform the omnichannel shopping experience
Questions? Visit https://lucidworks.com/contact/
Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar.
Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store.
In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn:
- How AI can drive increased revenue through hyper-personalised experiences
- How user intent can be easily understood and results displayed immediately
- How merchandisers can be empowered to curate results and product placement – all without having to rely on IT.
Presented by:
Dave Hawkins, Principal Sales Engineer - Lucidworks
Matthew Walsh, Director of Data & Retail - IMRG
Connected Experiences Are Personalized ExperiencesLucidworks
Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences.
For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success.
Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint.
In this webinar, you’ll learn:
- Why companies who utilize real-time customer signals report more effective personalization
- How to connect employees and customers in a shared experience through search and browse
- How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction
Featuring
- Will Hayes, CEO, Lucidworks
- Brendan Witcher, VP, Principal Analyst, Forrester
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
Intelligent Policing. Leveraging Data to more effectively Serve Communities.
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
-The technology needs of an intelligent police force.
-How a Global Search improves an officer's interaction with existing data.
Featuring:
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
The technology needs of an intelligent police force.
How a Global Search improves an officer's interaction with existing data.
Featuring
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
This document provides a framework for prioritizing onsite search problems and key performance indicators (KPIs) to measure for e-commerce search optimization. It recommends prioritizing fixing searches that yield no results, improving relevance of results, and reducing false positives. The most essential KPIs to measure include query latency, throughput, result relevance through click-through rates and NDCG scores. The document also provides tips for self-benchmarking search performance and examples of search performance benchmarks across nine e-commerce sites from various industries.
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition?
Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site.
In this session, attendees will find out how to:
-Take control of merchandising at scale;
-Implement hands-free search relevancy; and
-Address personalization challenges.
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.
For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.
Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.
We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.
In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases
Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option.
Join Lucidworks as we talk shop with 3 service business leaders, covering:
-Common impacts of the pandemic on service businesses (and what to do about them),
-How service teams can maintain a human touch across virtual channels, and
-Plans for the future, before and after the pandemic subsides.
Featuring
-Sara Nathan, President & CEO, AMIGOS
-Anthony Carruesco, Founder, AC Fly Fishing
-sara bradley, chef and proprietor, freight house
-Justin Sears, VP Product Marketing, Lucidworks
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
In this webinar, we’ll cover off:
-How search and deep learning extend conversational frameworks for improved experiences
-How Smart Answers improves customer care, call deflection, and employee self-service
-A live demo of Smart Answers for multi-channel self-service support
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
In the current climate, it’s now more important than ever to digitally enable your workforce and customers.
Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies.
In this webinar, we’ll discuss:
The top challenges and aspirations European business and technology leaders are solving using AI and search technology
Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe
What technology buyers should look for when evaluating AI and search solutions
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
This document introduces Fusion 5.1 and its new capabilities for integrating with data science tools like Tensorflow, Scikit-Learn, and Spacy.
It provides an overview of Fusion's capabilities for understanding content, users, and delivering insights at scale. The document then demonstrates Fusion's Jupyter Notebook integration for reading and writing data and running SQL queries.
Finally, it shows how Fusion integrates with Seldon Core to easily deploy machine learning models with tools like Tensorflow and Scikit-Learn. A live demo is provided of deploying a custom model and using it in Fusion's query and indexing pipelines.
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey.
In this webinar, you’ll learn:
● What trends and opportunities are facing the ecommerce industry in 2020
● Why search is the universal path to understanding customer intent
● How large online retailers apply AI to maximize the effectiveness of their personalization efforts
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers.
In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine.
The audience will learn about:
-The key technical challenges and outcomes that come with onboarding a solution
-The lessons learned of creating and executing operational design
-The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives.
In this webinar, you will learn
-- Introduction to knowledge graphs and where they fit in the ML landscape
-- How breakthroughs in search affect your business
-- The key features to consider when choosing a data discovery platform
-- Best practices for adopting AI-powered search, with real-world examples
Webinar: Building a Business Case for Enterprise SearchLucidworks
The document discusses building a business case for enterprise search. It notes that 85% of information is unstructured data locked in various locations and applications. Many knowledge workers spend a significant portion of their day searching across multiple systems for information. The rise of unstructured data and AI capabilities can help organizations unlock value from their information assets. Effective enterprise search powered by AI can provide real-time intelligence, personalized information, and more efficient research to help knowledge workers.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Webinar: Fusion for Business Intelligence
1.
2. Fusion
for
Business
Intelligence
Allan
Syiek
Senior
Sales
Engineer
September
14,
2016
3. Session
Objec,ves
By
the
end
of
this
session,
you
will:
– Have
a
high
level
awareness
of
the
variety
of
search
and
discovery
funcFonality
available
– Select
the
right
product
for
a
parFcular
use
case
– Know
why
this
baby
is
so
happy
4. Agenda
Ø The
Beer
and
Diaper
Legend
Ø DIKW
Pyramid
Ø What
is
Enterprise
Search
Ø Indexing
101
Ø StaFsFcs
vs.
Data
Mining
vs.
Machine
Learning
Ø What
is
Business
Intelligence
Ø Where
does
Fusion
Fit?
5. Parable
of
the
Beer
and
the
Diapers
Illustrates
the
difference
between
querying
and
data
mining,
already
firmly
enshrined
in
BI
mythology
7. What
is
Enterprise
Search
Q.
What
do
you
do
with
a
mountain
of
data
located
everywhere?
A.
Depends….
What
do
you
need
it
for?
8. • Crawling,
Parsing,
Indexing,
Searching
• Advanced
Searches
• Searching
Structured
Data
• Searching
Unstructured
Data
• Metadata
• Ranking
• Results
• Access
Control
• UI
• Tuning
• ReporFng
• Scale
and
Performance
Aspects
of
Enterprise
Search
9.
10. Index Pipeline
Tika
Parser
Exclusion
Filter
Field
Mapper
HTML
Transform
Stage
XML
Transform
Stage
OpenNLP
EnFty
ExtracFon
Gaze]eer
ExtracFon
Regular
Expression
AggregaFng
Javascript
(custom
scripts)
…and
others…
SearchCollection
SearchUI
Search
Fields/Parameters
Facets
Landing
Pages
Boost
Documents
Block
Documents
Security
Trimming
RecommendaFon
BoosFng
Rollup
Aggregator
Sub
Query
Javascript
(custom
scripts)
…and
others…
Documents
Query Pipeline
11.
Indexing
101
A
system
used
to
make
finding
informa,on
easier.
Every
word
is
converted
into
a
wordID
by
using
an
in-‐memory
hash
table
-‐-‐
the
lexicon.
Occurrences
in
the
current
document
are
translated
into
hit
lists
and
are
wri]en
into
the
forward
“barrels”.
Inverted
Barrels
have
been
sorted.
12. Indexing
101
-‐
Ranking
• Score
Results
for
PresentaFon
– Weighted
by
Term
Frequency-‐Inverse
Document
Frequency
(TF-‐IDF)
– Clustering
– Complex
proprietary
algorithms
14. Sta,s,cs
vs.
Data
Mining
vs.
Machine
Learning
– Sta,s,cs
quan%fies
numbers
– Data
Mining
explains
pa]erns
– Machine
Learning
predicts
with
models
– Ar,ficial
Intelligence
behaves
and
reasons
15. What
is
Business
Intelligence
• BI
technologies
provide
historical,
current
and
predicFve
views
of
business
operaFons
• Business
intelligence
is
made
up
of
an
increasing
number
of
components
including:
– MulFdimensional
aggregaFon
and
allocaFon
(OLAP–
Online
AnalyFcal
Processing)
– DenormalizaFon,
tagging
and
standardizaFon
(relaFonal
database)
– Real
Fme
reporFng
with
analyFcal
alert
– A
method
of
interfacing
with
unstructured
data
sources
(data
mining)
– Group
consolidaFon,
budgeFng
and
rolling
forecasts
– StaFsFcal
inference
and
probabilisFc
simulaFon
– Key
performance
indicators
opFmizaFon
– Version
control
and
process
management
– Open
item
management
16. • Why Fusion for Log
Analytics?
• Secure
access
to
dashboards
• ETL
of
logs
using
Index
pipelines
• Spark
run
analysis
models
for
logs
and
leverage
with
ML
index
pipeline
• Time
series
index
management
17. Massive-‐scale
log
analyFcs
• Index billions of log events per day, real-time
• Recent event and historical analysis: Analyze logs
over time: today, recent, past week, past 30 days,
…
• Easy to use dashboards to visualize common
questions and allow for ad hoc analysis
• Ability to scale linearly as business grows …
with sub-linear growth in costs!
• Easy to setup, easy to manage, easy to use
18. • Signals
&
RecommendaFons
Fusion
can
capture,
store,
and
aggregate
signals
from
a
variety
of
sources
to
drive
predicFve
search
capabiliFes
and
conFnuous
relevancy
tuning
Signals can include
Clicks
and
queries
Add-‐to-‐cart
and
purchase
behavior
Geo-‐locaFon
User
behavior
and
preferences
User
history
and
past
orders
Device
19. VisualizaFon
&
Insight
with
SILK
SILK Dashboards provide a rich visual
interface for users to search, inspect and
visualize event/log data
Gives user the power to perform ad-hoc
search and analysis on massive amounts
of multi-structured and time series data.
Real-time insights and trends for on-the-
fly decision making using the most
accurate and up-to-date data
Users can share visualizations and
dashboards