This talk provides a tour of how Apache Solr is used to power search for America's largest flash sale site, www.gilt.com. We show how to address the challenges of providing listings for fast moving inventory in a search space personalized for each of our members. The solution, built on Play Framework comprises less than 4,000 lines of code, and provides response times of 40ms on average.
• What is a web log?
• Where do they come from?
• Why are they relevant?
• How can we analyze them?
• What about Clickstream?
Facing these questions I have make a personal research, and realize a synthesis, which has help me to clarify some ideas. This presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful insights.
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
f you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.
This was the opening presentation of the Zenoh Summit in June 2022. The presentation goes through the motivations that lead to the design of the zenoh protocol and provides an introduction of its core concepts. This is the place to start to understand why you should care about zenoh and the way in which is disrupts existing technologies.
The recording for this presentation is available at https://bit.ly/3QOuC6i
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer.
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
Of all the developers’ delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you’ll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)
• What is a web log?
• Where do they come from?
• Why are they relevant?
• How can we analyze them?
• What about Clickstream?
Facing these questions I have make a personal research, and realize a synthesis, which has help me to clarify some ideas. This presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful insights.
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
f you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.
This was the opening presentation of the Zenoh Summit in June 2022. The presentation goes through the motivations that lead to the design of the zenoh protocol and provides an introduction of its core concepts. This is the place to start to understand why you should care about zenoh and the way in which is disrupts existing technologies.
The recording for this presentation is available at https://bit.ly/3QOuC6i
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer.
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
Of all the developers’ delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you’ll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.
Building a real time, solr-powered recommendation engineTrey Grainger
Searching text is what Solr is known for, but did you know that many companies receive an equal or greater business impact through implementing a recommendation engine in addition to their text search capabilities? With a few tweaks, Solr (or Lucene) can also serve as a full featured recommendation engine. Machine learning libraries like Apache Mahout provide excellent behavior-based, off-line recommendation algorithms, but what if you want more control? This talk will demonstrate how to effectively utilize Solr to perform collaborative filtering (users who liked this also liked…), categorical classification and subsequent hierarchical-based recommendations, as well as related-concept extraction and concept based recommendations. Sound difficult? It’s not. Come learn step-by-step how to create a powerful real-time recommendation engine using Apache Solr and see some real-world examples of some of these strategies in action.
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Databricks
The convergence of big data technology towards traditional database domain has became an industry trend. At present, open source big data processing engines, such as Apache Spark, Apache Hadoop, Apache Flink, etc., already support SQL interfaces, and the usage of SQL basically occupies a dominant position. Companies use above open source software to build their own ETL framework and OLAP technology. However, in terms of OLTP technology, it is still a strong point of traditional databases. One of the main reasons is the support of ACID by traditional databases.
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.
At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is MapReduce?
✓ MapReduce Data Flows
✓ MapReduce Programming
----------
What is MapReduce?
MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java.
----------
What are MapReduce Components?
It has the following components:
1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing.
2. Job Tracker: This allocates the data across multiple servers.
3. Task Tracker: This executes the program across various servers.
4. Reducer: It will isolate the desired output from across the multiple servers.
----------
Applications of MapReduce
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Zenoh is rapidly growing Eclipse project that unifies data in motion, data at rest and computations. It elegantly blends traditional pub/sub with geo distributed storage, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks. This presentation will provide an introduction to Eclipse Zenoh along with a crisp explanation of the challenges that motivated the creation of this project. We will go through a series of real-world use cases that demonstrate the advantages brought by Zenoh in enabling and optimising typical edge scenarios and in simplifying the development of any scale distributed applications.
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. We have seen the integration in three different layers around TigerGraph’s data flow architecture, and many key use case areas such as customer 360, entity resolution, fraud detection, machine learning, and recommendation engine. Firstly, TigerGraph’s internal data ingestion architecture relies on Kafka as an internal component. Secondly, TigerGraph has a builtin Kafka Loader, which can connect directly with an external Kafka cluster for data streaming. Thirdly, users can use an external Kafka cluster to connect other cloud data sources to TigerGraph cloud database solutions through the built-in Kafka Loader feature. In this session, we will present the high-level architecture in three different approaches and demo the data streaming process.
Solr nyc meetup - May 14 2015 - Adrian TrenamanAdrian Trenaman
Learn how we adopted Solr at Gilt to provide personalised search results, from early days using Solr 3 on just three nodes, to a more modern clustered deployment today running Solr 4 that powers all listing pages on Gilt through our daily pulse load at noon. We'll discuss what worked, what didn't, and where we're going next for search on gilt.com.
Learn how Gilt uses Solr to incorporate realtime inventory changes and targeting
Note that this is a slight reworking of an original presentation from ApacheCon 2012 for Lucene Revolution 2013
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.
Building a real time, solr-powered recommendation engineTrey Grainger
Searching text is what Solr is known for, but did you know that many companies receive an equal or greater business impact through implementing a recommendation engine in addition to their text search capabilities? With a few tweaks, Solr (or Lucene) can also serve as a full featured recommendation engine. Machine learning libraries like Apache Mahout provide excellent behavior-based, off-line recommendation algorithms, but what if you want more control? This talk will demonstrate how to effectively utilize Solr to perform collaborative filtering (users who liked this also liked…), categorical classification and subsequent hierarchical-based recommendations, as well as related-concept extraction and concept based recommendations. Sound difficult? It’s not. Come learn step-by-step how to create a powerful real-time recommendation engine using Apache Solr and see some real-world examples of some of these strategies in action.
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Databricks
The convergence of big data technology towards traditional database domain has became an industry trend. At present, open source big data processing engines, such as Apache Spark, Apache Hadoop, Apache Flink, etc., already support SQL interfaces, and the usage of SQL basically occupies a dominant position. Companies use above open source software to build their own ETL framework and OLAP technology. However, in terms of OLTP technology, it is still a strong point of traditional databases. One of the main reasons is the support of ACID by traditional databases.
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.
At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is MapReduce?
✓ MapReduce Data Flows
✓ MapReduce Programming
----------
What is MapReduce?
MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java.
----------
What are MapReduce Components?
It has the following components:
1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing.
2. Job Tracker: This allocates the data across multiple servers.
3. Task Tracker: This executes the program across various servers.
4. Reducer: It will isolate the desired output from across the multiple servers.
----------
Applications of MapReduce
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Zenoh is rapidly growing Eclipse project that unifies data in motion, data at rest and computations. It elegantly blends traditional pub/sub with geo distributed storage, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks. This presentation will provide an introduction to Eclipse Zenoh along with a crisp explanation of the challenges that motivated the creation of this project. We will go through a series of real-world use cases that demonstrate the advantages brought by Zenoh in enabling and optimising typical edge scenarios and in simplifying the development of any scale distributed applications.
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. We have seen the integration in three different layers around TigerGraph’s data flow architecture, and many key use case areas such as customer 360, entity resolution, fraud detection, machine learning, and recommendation engine. Firstly, TigerGraph’s internal data ingestion architecture relies on Kafka as an internal component. Secondly, TigerGraph has a builtin Kafka Loader, which can connect directly with an external Kafka cluster for data streaming. Thirdly, users can use an external Kafka cluster to connect other cloud data sources to TigerGraph cloud database solutions through the built-in Kafka Loader feature. In this session, we will present the high-level architecture in three different approaches and demo the data streaming process.
Solr nyc meetup - May 14 2015 - Adrian TrenamanAdrian Trenaman
Learn how we adopted Solr at Gilt to provide personalised search results, from early days using Solr 3 on just three nodes, to a more modern clustered deployment today running Solr 4 that powers all listing pages on Gilt through our daily pulse load at noon. We'll discuss what worked, what didn't, and where we're going next for search on gilt.com.
Learn how Gilt uses Solr to incorporate realtime inventory changes and targeting
Note that this is a slight reworking of an original presentation from ApacheCon 2012 for Lucene Revolution 2013
Cross mobile testautomation mit Xamarin & SpecFlowChristian Hassa
Test automation can be implemented most efficiently as a by-product of Specification-By-Example (SbE). It combines acceptance criteria specification and acceptance test driven development (ATDD, BDD) to build automatically validated specifications of the system. The practice is well established in many teams for “traditional” enterprise application development (web clients, rich clients, services), and supported with a broad range of tools.
In mobile development, however, we seem to start over again with bare-bones test automation tool support that provokes post implementation test automation, which is costly and hard to maintain. Teams that had already successfully applied ATDD/BDD fall back into old habits when moving to mobile development. This is due to the lack of tool support and a lack of confidence that the principles that worked before can also be applied in mobile development.
Join Gaspar, Christian and Andreas for a brief introduction to BDD and Specification-By-Example. They’ll then show how it can be put into practice with SpecFlow and Calabash for a mobile app that is developed using Xamarin.
During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.
A story about developing an application for an online store, persisting all the data as JSON.
Gives an overview of JSON functionality in Oracle Database 19c.
Using SQL to query relational databases is easy. As a declarative language, it’s straightforward to write queries and build powerful applications. However, relational databases struggle when working with complex data. When querying such data in SQL, challenges especially arise in the modelling and querying of the data.
For example, due to the large number of necessary JOINs, it forces us to write long and verbose queries. Such queries are difficult to write and prone to mistakes.
TypeQL is the query language used in TypeDB. Just as SQL is the standard query language in relational databases, TypeQL is TypeDB's query language. It’s a declarative language, and allows us to model, query and reason over our data.
In this talk, we will look at how TypeQL compares to SQL. Why and when should you use TypeQL over SQL? How do we do outer/inner joins in TypeQL? We'll look at the common concepts, but mostly talk about the differences between the two.
Speaker: Tomás Sabat
Tomás is the Chief Operating Officer at Vaticle. He works closely with TypeDB's open source and enterprise users who use TypeDB to build applications in a wide number of industries including financial services, life sciences, cybersecurity and supply chain management. A graduate of the University of Cambridge, Tomás has spent the last seven years founding and building businesses in the technology industry.
How to integrate Splunk with any data solutionJulian Hyde
A presentation Julian Hyde gave to the Splunk 2012 User conference in Las Vegas, Tue 2012/9/11. Julian demonstrated a new technology called Optiq, described how it could be used to integrate data in Splunk with other systems, and demonstrated several queries accessing data in Splunk via SQL and JDBC.
Automated UI test on mobile - with Cucumber/CalabashNiels Frydenholm
Automated UI tests with Cucucumber/Calabash - experiences from the trenches, and the lessons we have learned along the way at ebay Classifieds in Denmark.
How to structure your test code, run in it in a CI environment, and get fast feedback to make it a succes.
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave ClubData Con LA
Abstract:-
Data engineering at Dollar Shave Club has grown significantly over the last year. In that time, it has expanded in scope from conventional web-analytics and business intelligence to include real-time, big data and machine learning applications. We have bootstrapped a dedicated data engineering team in parallel with developing a new category of capabilities. And the business value that we delivered early on has allowed us to forge new roles for our data products and services in developing and carrying out business strategy. This progress was made possible, in large part, by adopting Apache Spark as an application framework. This talk describes what we have been able to accomplish using Spark at Dollar Shave Club.
Bio:-
Brett Bevers, Ph.D. Brett is a backend engineer and leads the data engineering team at Dollar Shave Club. More importantly, he is an ex-academic who is driven to understand and tackle hard problems. His latest challenge has been to develop tools powerful enough to support data-driven decision making in high value projects.
As part of Adobe Experience Manager, CQ 5.6 provides a new Commerce Framework to build Experience Driven Commerce websites on top of a 3rd party Commerce Platform. This session provides an overview of the framework from an architectural perspective and presents some details of the reference implementation, based on the JCR repository.
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
The way we build screens on Android changed significantly over the last years. We went from huge classes, strongly coupled with Android and containing a lot of responsibilities to a reactive-based data flow. It helped us to address ever-changing business requirements, extend functionality without adding unmanageable complexity and improve the app’s reliability. In this presentation I talked about lessons learned from this journey and the pros and cons of each approach.
Trent McConaghy, CTO of BigchainDB, talks about the journey from blockchain databases, to DAOs to AI DAOs, covering everything from the architecture to knowledge extraction and machine creativity.
A talk I gave to Form 4 at my son's school, St. Mary's College Rathmines, Dublin, Ireland, for Science Week, on why I love computer science and the art of programming. We had some fun going through the Josephus problem, and talking about how the 1's and 0's really work, and mentioning some of the heroes and heroines of the discipline.
Slides from my presentation on Micro-Services at Gilt at JavaOne 2015, San Francisco. I've given this preso a number of times, and this iteration includes some notes on how we classify and track our services, how we've inferred an emergent architecture, tracked ownership, and identified complexity.
GeeCON Microservices 2015 scaling micro services at giltAdrian Trenaman
An evolution of the talk I gave at CraftConf earlier this year, talking about software architecture and micro-services at Gilt. Some new additions include ownership, service discovery and service anatomy.
My talk at the inaugural micro services meet-up at the Engine Yard in Dublin! An honest look at how we've landed on our micro-services architecture at Gilt, and the challenges we're facing.
Instructional webinar on how to create an consume web services with Apache ServiceMix using Apache CXF. We cover code generation, JAX-WS implementation, Spring configuration and both WAR and OSGi bundle-based deployment models.
OSGi for real in the enterprise: Apache Karaf - NLJUG J-FALL 2010Adrian Trenaman
Want to know how to design, implement and deploy modular enterprise integration solutions using OSGi? The Apache Karaf OSGi shell, used by Apache Felix and Apache ServiceMix, enhances core OSGi implementations like Felix or Equinox with an easy to use, extendible command shell, providing logging, hot deployment, configuration, container administration, clustering, high availability and easy 'feature-based' dependency management In this session, you'll learn how Karaf works, and how you can leverage Karaf either on its own or embedded within ServiceMix to deploy business logic, RESTful services, EIP-based integration flows and web services. You'll learn how to extend the command shell with your own commands, and, use Spring-DM *or* OSGi BluePrint Services to make using OSGi a walk in the park.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
Personalized Search on the Largest Flash Sale Site in America
1. How
Solr
powers
search
on
America’s
largest
flash
sale
site.
or
“Personalized
search
of
a
fast-‐moving
time-‐sensitive
data-‐set”
Ade
Trenaman
-‐
Team
Galactus
atrenaman@gilt.com
@adrian_trenaman
http://tech.gilt.com
http://slideshare.net/trenaman
1
11. The
Gilt
noon
‘attack
of
self
denial’
Email
&
Mobile
notification
of
today’s
sales
goes
to
members
at
11:45
EST...
...
Sales
go
live
at
Noon
EST.
Stampede.
11
15. Gilt
Data
Model
101
A Haiku:
Simple relations, Product Silk Charmeuse
Wrap Dress
Solr makes hard; weeping, I
denormalize now. 1
Ade, Sinsheim, 2012 *
... in white
Look
1
*
* * ... in size 8
Sale SKU 8
15
16. QED Architecture Pattern: Query, Enrich, Deliver
search response: (lookId, skuId*)* as JSON
Browser <<play>> Solr
web-search Solr
Search API
web-search Solr
enrich
client-side controllers plugins
plugins &
rendering components
HTML/JSON
note: search API for Product Inventory Targeting
isolation, plugins to
perform personalization Service Service Service
& inventory sorting,
hydration of results
Motivating principle: “Do One Thing Well”
16
17. Search
MVP
Provide
search
listings
of
‘product
looks’
Move
sold-‐out
inventory
to
bottom
in
realtime
Facet
by
Size
/
Color
/
Brand
/
Category
Search
by
store
(Home,
Women’s,
...)
Provide
targeted
search
results
Respect
sale
start
/
end
time
in
listings
Auto-‐suggest
Auto-‐complete
17
19. Search
space:
what
to
index?
Taxonomy! Brand, Product Name!
Price!
Color!
Inventory Status!
Description!
SKU Attributes /
Material / Origin!
19
20. Data,
data,
data
gonna
hate
ya,
hate
ya,
hate
ya
Not
all
product
data
is
as
clean
as
you’d
like.
Description
contains
distracting
tokens
(‘also
available
in
blue,
black
(patent),
green
and
leather’)
Colors
are
often
poorly
named:
‘blk’,
‘black’,
‘priest’,
‘black
/
white’,
‘nite’,
‘night’,
‘100’,
‘multi’,
‘patent’.
Brand
names
can
mislead:
‘Thomas
Pink
Shirts’,
‘White
+
Warren
Black
Dress’
Trust
nothing.
Try
everything.
Be
ready
for
surprises...
20
21. Example:
synonym
leakage
Naive
and
excited,
we
configured
the
Princeton
Synonym
database
with
Solr
:)
Search
for
“black
dress”
yields
“Jet
Set
Tote”
:(
Learning:
don’t
blindly
rely
on
synonyms.
huh?
Doh!
(black → ‘jet black’ → jet) (dress → set)
21
22. Index
by
looks
or
by
SKU?
Turmoil:
we
list
looks,
but
filter
by
SKU
attributes
(e.g.
size).
Bad
Idea:
index
looks,
with
size
as
a
multi-‐value
field:
product_look_sizes
=
"S",
"M",
"L",
"XL"
...
filtering
with
product_look_sizes:M
might
return
a
product
that
has
inventory
for
S,
L,
XL
but
no
inventory
for
"M".
Really
Bad
Experience.
Better
idea:
denormalize.
Index
by
SKU
with
a
single-‐valued
field
size
field:
sku_size
=
"M"
...
now,
when
someone
indexes
by
"M",
they
only
SKUs
in
medium.
Bazinga!
22
23. Index
by
looks
or
by
SKU?
(cont’)
But
we
list
looks,
not
SKUS!!
The
last
thing
a
customer
wants
is
to
see
8
pictures
of
the
same
dress
in
8
different
sizes
Group
by
'product
look
ID'
using
Solr
grouping:
works
a
treat!
<str
name="group">
true
</str>
<str
name="group.field">
sku_look_id_s
</str>
<str
name="group.limit">
-‐1
</str>
<str
name="group.ngroups">
true
</str>
Initial
worries
about
performance
are
(so-‐far)
unfounded.
“Premature
optimization
is
the
root
of
all
evil.”
We
-‐may-‐
change
to
block-‐join
queries
when
we
move
to
Solr4.
23
24. Solr Internals & Extensions I
Solr
request
INDEX
group
Index contains SKUs
search response: (lookId, skuId*)* as JSON
24
26. Real-‐time
inventory
ordering
Crucial:
sold-‐out
looks
should
move
to
bottom
of
list;
i.e.
sort
by
‘inventory
status’.
Given:
an
asynchronously
updated
caching
'inventory
status'
API.
InventoryStatus
status
=
getInventory(sku)
...
we
created
a
custom
inventory()
function
in
Solr
that
assigns
a
numeric
value
(0/1)
to
each
SKU
in
a
result
[O(n)]
Queries
can
now
be
sorted
using:
order
by
inventory(sku),
score
26
30. Targeting
Sale-‐targeting
is
an
important
tool
for
Gilt
merch:
Groups
(Noir
members,
employees,
...),
geo-‐IP,
weather,
inclusion/exclusion
lists
Search
listings
and
auto-‐suggest
must
respect
sale
targeting.
Solution:
Store
the
saleId
that
the
SKU
is
available
in
the
index
Pass
the
user’s
GUID
to
Solr
Filter
results
using
a
custom
filter
if
(!
allowed(saleId,
userGuid))
{
//
remove
result
from
listing
}
30
31. Solr Internals & Extensions III
Solr
request
inventory function
INDEX
targeting filter
group
sort
search response: (lookId, skuId*)* as JSON
Inventory Targeting
Service Service
31
33. Our
index
is
time-‐sensitive
Sales
go
live
at
Noon
EST:
Do
not
want
products
to
be
searchable
if
the
sale
is
not
yet
active.
Do
want
products
to
be
searchable
at
exactly
Noon
EST
We
index
Solr
every
n
minutes
(n
==
15)
Need
to
encode
a
sense
of
time
in
the
index.
33
34. Time-‐sensitive
search
1st
idea:
index
a
SKUs
start/end
availability
and
then
filter
on
that.
Use
Solr’s
‘dynamic
fields’:
for
each
store
(men
/
women
/
home
/
kids
/
...)
{
//
get
the
nearest
upcoming
availbility
window
&
index
it.
sku_availability_start_<store>
=
startTime
sku_availability_end_<store>
=
endTime
sku_availability_sale_<store>
=
saleId
}
Then,
naively
filter
using
(see
next
slide
for
improvement):
fq=+sku_availability_start_<store>:[*
TO
NOW]
+sku_availability_end_<store>:[NOW+1HOUR
TO
*]
34
35. All
praise
and
thanks
to
the
chump.
Filtering
by
time
for
every
request
is
expensive
:(
Posed
question
to
The
Chump
at
Lucene
Revolution
2012.
Solution:
rounding
time
bounds
to
nearest
hour
/
minute
means
we
get
a
cached
query
filter.
[*
TO
NOW]
→
[*
TO
NOW/HOUR]
Super
Chump.
Chump
is
Champ.
Chump-‐tastic.
Chump-‐nominal.
Awe-‐chump.
35
36. Time-‐sensitive
search
(cont’)
e.g.
Consider
a
SKU
available
in
women
&
home:
sku_availability_start_home
=
32141232
//
Time
sku_availability_end_home
=
32161232
//
Time
sku_availability_sale_home
=
12121221
//
SaleID
sku_availability_start_women
=
32141232
//
Time
sku_availability_end_women
=
32161232
//
Time
sku_availability_sale_women
=
223423424
//
SaleID
Now,
can
search
in
Home
store
using:
fq=+sku_availability_start_home:[*
TO
NOW/HOUR]
+sku_availability_end_home:[NOW/HOUR+1HOUR
TO
*]
36
37. Problems
in
the
space-‐time
continuum
(cont’)
At
index
time,
we
used
a
heuristic
to
determine
the
'best
availability
window'
when
a
SKU
is
in
2
sales
in
the
same
store
“Earliest
active
window,
or
soonest
starting
window”:
some
windows
ignored
Targeting
problems:
might
remove
a
SKU
in
a
restricted
sale,
even
if
the
SKU
is
visible
in
another
Sale
:(
‘Best Availability Window’
Sale A (targeted to ‘Noir’)
Sale B (untargeted)
Sale C (weekend specials)
Danger Zone
37
38. Problems
in
the
space-‐time
continuum
(cont’)
Also,
some
sales
are
restricted
to
the
channel
the
product
is
in!
'Channels':
e.g.
mobile,
iPad,
...
Our
'best
availability
window'
code
was
already
creaking;
Introducing
more
dynamic
fields
(channel
x
store)
stank.
sku_availability_start_web_home
=
32141232
//
Time
sku_availability_end_web_home
=
32161232
//
Time
sku_availability_sale_web_home
=
12121221
//
SaleID
sku_availability_start_ipad_home
=
32141232
//
Time
sku_availability_end_ipad_home
=
32161232
//
Time
sku_availability_sale_ipad_home
=
12121221
//
SaleID
sku_availability_start_mobile_home
=
32141232
//
Time
:
:
:
:
38
39. Eureka!
Stop
indexing
SKUs;
instead:
index
'the
availability
of
a
SKU
at
a
moment
in
time'
sku_name
=
...
sku_color
=
...
sku_availibility_start
=
sku_availibility_end
=
sku_store
=
home
sku_channels
=
mobile,
ipad
sku_sale_id
=
The
filter
becomes
much
simpler:
fq
=
+sku_availability_start:[*
TO
NOW/HOUR]
+sku_availability_end:[NOW/HOUR+1HOUR
TO
*]
Code
fell
away.
Just
had
to
to
post-‐filter
out
multiple
SKU
availabilities
in
results
[O(n)]
39
40. Solr Internals & Extensions IV
Solr
request
availability-component
inventory function
INDEX
targeting filter
de-dupe filter
group
sort
search response: (lookId, skuId*)* as JSON
Inventory Targeting
Service Service
40
42. Mixed-‐logic
multi-‐select
faceting
Facet
filters
typically
use
'AND'
logic
size
=
36
AND
brand
=
'Timber
Island'
AND
color
=
'RED'
Prefer
mixed
logic,
where
'OR'
is
used
for
filters
on
the
same
facet.
size
=
36
OR
size
=
38
AND
brand
=
'Timber
Island'
AND
color
=
'RED'
OR
color='BLUE'
Use
‘multi-‐select’
faceting
technique
to
ensure
that
facet
values
don’t
disappear.
Tag
the
facet
filter-‐query:
fq:{!tag=brand_fq}brand_facet:"Timber
Island"
Ignore
the
effects
of
the
facet
filter-‐query
when
faceting:
<str
name="facet.field">{!ex=brand_fq}brand_facet</str>
42
43. ‘Disappearing
facet
problem’
To
get
facets
relevant
to
the
query,
we
set
facet.mincount
to
1.
<str
name="facet.mincount">1</str>
However:
if
subsequent
facet
filters
reduce
the
facet
count
to
zero,
then
the
facet's
'disappear'
:(
Consider
a
search
for
'shoes',
returning
the
following
facets
brand:
Ade
-‐>
10,
Eric
-‐>
5,
color:
black
-‐>
8,
red
-‐>
7
Then
filter
on
red
(and
assume
that
only
Ade
shoes
are
red).
We
want
to
have:
brands:
Ade
-‐>
7,
Eric
-‐>
0,
colors:
black
-‐>
0,
red
-‐>
7
However,
if
facet.mincount
==
1,
we
get:
brands:
Ade
-‐>
7
colors:
red
-‐>
7
43
44. ‘Disappearing
facet
problem’
(cont’)
We
want
to
say
'Eric
and
black
are
relevant
to
the
query,
but
has
zero
results
due
to
filtering'.
Solution:
For
each
facet
overlay
the
original
counts
with
the
values
of
the
new
count
if
present,
or
zero
otherwise.
<str
name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq
key=all_brand_facets}brand_facet</str>
<str
name="facet.field">{!ex=brand_fq,size_fq,taxonomy_fq,color_fq
key=all_color_facets}color_facet</str>
This
means
we
get:
all_brand_facets:
Ade
-‐>
10,
Eric
-‐>
5,
brand_facet:
Ade
-‐>
7
all_color_facets:
black
-‐>
8,
red
-‐>
7,
color_facet:
red
-‐>
7
Merge
to
get:
brand_facet:
Ade
-‐>
7,
Eric
-‐>
0,
color_facet:
red
-‐>
7,
black
-‐>
0
44
45. Color
Me
Happy
Need
to
map
SKU's
color
to
a
simple
set
of
colors
for
faceting.
Used
a
synonym
file
to
map
'urobilin'
to
'yellow'
Use
a
stopwords
file
to
remove
unknown
colors
It
works,
but
it's
brittle;
want
to
move
to
a
solution
based
on
color
analysis
of
swatches
45
46. Hierarchical
taxonomy
faceting
Our
taxonomy
is
hierarchical.
Solr
faceting
is
not
:(
Encoded
taxonomy
hierarchy
using
'paths'
"Gilt::Men::Clothing::Pants"
Our
search
API
converts
the
paths
into
a
tree
for
rendering
purposes.
Works
but
(a)
feels
jenky
(b)
ordering
is
alphabetic
Prefer
in
future
to
use
unique
taxonomy
key,
and
use
that
the
filter
through
an
ordered
tree
in
our
AP
46
47. Hierarchical
size
faceting
Size
ordering
is
non
trivial:
there
is
a
two-‐level
hierarchy
&
ordering
is
difficult:
00,
0,
2,
4,
6,
8,
10,
12
rather
than
0,
00,
10,
12,
2,
4,
6,
8
XS,
S,
M,
L,
XL,
XXL
rather
than
M,
L,
S,
XL,
XS,
XXL
End
up
encoding
a
size
ordinal
into
the
size
facet
label
"Women's
Apparel::[[0000000003]]00"
"Women's
Apparel::[[0000000004]]0"
"Women's
Apparel::[[0000000005]]2"
OK,
but
again,
feels
hacky.
“If
only
there
was
a
way
to
annotate
facets
with
meaningful
data”
47
48. Solr Internals & Extensions (V)
Solr
request
availability-component
inventory function
INDEX
targeting filter
de-dupe filter
group
facet
sort
search response: (lookId, skuId*)* as JSON
Inventory Targeting
Service Service
48
50. Lesson
II:
Solr
makes
the
business
Love
You
Gilt
Search
released
in
A/B
test
to
30%.
Initial
deployment
can
handle
>
180,000
RPM
(5:1
ratio
of
auto-‐complete
to
search)
Members
who's
visit
included
search
are
4x
more
likely
to
buy.
Search
generates
2-‐4%
incremental
revenue
Business
pleads
with
us
to
end
A/B
test
and
release
to
100%
We've
iterated
to
drive
more
traffic
(&
succeeded)
with
no
loss
of
conversion.
From
subtle
to
suBtle.
50
51. Lesson
III:
Excelsior
Power
all
listings
(keyword,
category,
sale,
brand)
via
Solr
Faceting
on
SKU
attributes
(e.g.
‘age’,
‘gender’)
Solr
4
Hack
Debridement:
Improve
hierarchical
faceting
&
facet
ordering
‘Double
facet’
feels
wrong
51