This document provides an overview of Solr, an open source enterprise search platform. It describes Solr's core functions like indexing, searching, and analyzing documents. It also explains how to configure Solr for indexing, querying, highlighting search results, and more. Various Solr query syntax and relevancy tuning options are demonstrated through examples.
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
The open source Apache Solr open source search engine provides powerful, versatile search application development technology so you to take full control of your search needs. Solr’s rich interfaces and convenient server packaging of the underlying Apache Lucene search libraries into web service interfaces, and near limitless customizability let you take control of your search. From e-commerce to content management and endless variations in between, Solr is the right tool at the right time to turn ever growing volume and variety of data and documents to the advantage of your business.http://www.lucidimagination.com/blog/2009/12/01/webinar-an-introduction-to-basics-of-search-and-relevancy-with-apache-solr/
A quick tour of available integration hooks in Apache Jackrabbit Oak to plug in Apache Solr in order to provide scalable search (& more) functionalities to the repository
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
Manage Add-on Services in Apache AmbariJayush Luniya
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
The open source Apache Solr open source search engine provides powerful, versatile search application development technology so you to take full control of your search needs. Solr’s rich interfaces and convenient server packaging of the underlying Apache Lucene search libraries into web service interfaces, and near limitless customizability let you take control of your search. From e-commerce to content management and endless variations in between, Solr is the right tool at the right time to turn ever growing volume and variety of data and documents to the advantage of your business.http://www.lucidimagination.com/blog/2009/12/01/webinar-an-introduction-to-basics-of-search-and-relevancy-with-apache-solr/
A quick tour of available integration hooks in Apache Jackrabbit Oak to plug in Apache Solr in order to provide scalable search (& more) functionalities to the repository
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
Manage Add-on Services in Apache AmbariJayush Luniya
Cutting-edge Hadoop clusters are bound to need custom (add-on) services that are not available in the Hadoop distribution of their choice. Agility is crucial for companies to integrate any service into existing large-scale Hadoop clusters with ease.
Apache Ambari manages the Hadoop cluster and solves this problem by extending the stack with add-on services, which can be a new Apache project, different Hadoop file system, or internal tool. This talk covers how to create a service definition in Ambari to manage lifecycle commands and configs, plus advanced topics like packaging, installing from multiple repositories, recommending and validating configs using Service Advisor, running custom commands, defining dependencies on configs and other services, and more. We will also cover how to create custom metrics and dashboards using Ambari Metric System and Grafana, generating alerts, and enabling security by authenticating with Kerberos.
Further, we will discuss the future of service definitions and how Ambari 3.0 will support custom services through Management Packs to enable Hadoop vendors to release software faster.
MySQL users commonly ask: Here's my table, what indexes do I need? Why aren't my indexes helping me? Don't indexes cause overhead? This talk gives you some practical answers, with a step by step method for finding the queries you need to optimize, and choosing the best indexes for them.
Streamline Hadoop DevOps with Apache AmbariJayush Luniya
Ambari talk at Hadoop Summit, Tokyo 2016
Abstract
Apache Ambari has become an indispensable tool for operating Hadoop clusters from as small as 10s of nodes to 1000s of nodes. Ambari’s deep knowledge of the Hadoop stack allows it to deploy a cluster within minutes and manage the entire lifecycle: scaling, security, upgrades, and more. This talk will cover the central features important to cluster operators and the latest innovations from the community. We will discuss automatically deploying clusters with Blueprints, adding custom services, scaling the number of hosts as the data needs grow, adding High Availability for critical services, securing with MIT kerberos, and upgrading the Hadoop stack with features like Rolling & Express Upgrade. More advanced users will also be interested in using Ambari’s powerful REST API to automate workflows. For users and data scientists, Ambari provides LDAP sync, Role-Based Access Control to handle user permissions, and a framework to host Ambari Views such as as the newly added Views for Hive, Oozie, Capacity Scheduler, Tez, Storm, and Zeppelin. Lastly, we will cover how to monitor the health of the cluster via Alerts and troubleshoot problems by using new features like LogSearch and Ambari Metrics Systems integrated with Grafana UI.
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
Cloudera Morphlines is a new open source framework, recently added to the CDK, that reduces the time and skills necessary to integrate, build, and change Hadoop processing applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, or analytic online dashboards.
We browse the Internet. We host our applications on a server or a cloud that is hooked up with a nice domain name. That’s all there is to know about DNS, right? This talk is a refresher about how DNS works. How we can use it and how it can affect availability of our applications. How we can use it as a means of configuring our application components. How this old geezer protocol is a resilient, distributed system that is used by every Internet user in the world. How we can use it for things that it wasn’t built for. Come join me on this journey through the innards of the web!
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
Create a Database Application Development Environment with DockerBlaine Carter
In this session, Blaine Carter will explain how to use Docker to create Docker Containers which can be used to develop Database Applications with Node.js and Oracle REST Data Services (ORDS). He will demonstrate how to use Docker to create one container running an Oracle XE Database, a second running REST ORDS, and a third running Node.js on Linux running code from files in a directory on the host machine. All three containers will communicate through a Docker Network. Once the containers are up and running, Blaine will run through a couple of examples using the new Docker Containers.
MySQL users commonly ask: Here's my table, what indexes do I need? Why aren't my indexes helping me? Don't indexes cause overhead? This talk gives you some practical answers, with a step by step method for finding the queries you need to optimize, and choosing the best indexes for them.
Streamline Hadoop DevOps with Apache AmbariJayush Luniya
Ambari talk at Hadoop Summit, Tokyo 2016
Abstract
Apache Ambari has become an indispensable tool for operating Hadoop clusters from as small as 10s of nodes to 1000s of nodes. Ambari’s deep knowledge of the Hadoop stack allows it to deploy a cluster within minutes and manage the entire lifecycle: scaling, security, upgrades, and more. This talk will cover the central features important to cluster operators and the latest innovations from the community. We will discuss automatically deploying clusters with Blueprints, adding custom services, scaling the number of hosts as the data needs grow, adding High Availability for critical services, securing with MIT kerberos, and upgrading the Hadoop stack with features like Rolling & Express Upgrade. More advanced users will also be interested in using Ambari’s powerful REST API to automate workflows. For users and data scientists, Ambari provides LDAP sync, Role-Based Access Control to handle user permissions, and a framework to host Ambari Views such as as the newly added Views for Hive, Oozie, Capacity Scheduler, Tez, Storm, and Zeppelin. Lastly, we will cover how to monitor the health of the cluster via Alerts and troubleshoot problems by using new features like LogSearch and Ambari Metrics Systems integrated with Grafana UI.
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
Cloudera Morphlines is a new open source framework, recently added to the CDK, that reduces the time and skills necessary to integrate, build, and change Hadoop processing applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, or analytic online dashboards.
We browse the Internet. We host our applications on a server or a cloud that is hooked up with a nice domain name. That’s all there is to know about DNS, right? This talk is a refresher about how DNS works. How we can use it and how it can affect availability of our applications. How we can use it as a means of configuring our application components. How this old geezer protocol is a resilient, distributed system that is used by every Internet user in the world. How we can use it for things that it wasn’t built for. Come join me on this journey through the innards of the web!
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
To get data, we query the database. MySQL does its best to return requested bytes as fast as possible. However, it needs human help to identify what is important and should be accessed in the first place.
Queries, written smartly, can significantly outperform automatically generated ones. Indexes and Optimizer statistics, not limited to the Histograms only, help to increase the speed of the query a lot.
In this session, I will demonstrate by examples of how MySQL query performance can be improved. I will focus on techniques, accessible by Developers and DevOps rather on those which are usually used by Database Administrators. In the end, I will present troubleshooting tools which will help you to identify why your queries do not perform. Then you could use the knowledge from the beginning of the session to improve them.
Create a Database Application Development Environment with DockerBlaine Carter
In this session, Blaine Carter will explain how to use Docker to create Docker Containers which can be used to develop Database Applications with Node.js and Oracle REST Data Services (ORDS). He will demonstrate how to use Docker to create one container running an Oracle XE Database, a second running REST ORDS, and a third running Node.js on Linux running code from files in a directory on the host machine. All three containers will communicate through a Docker Network. Once the containers are up and running, Blaine will run through a couple of examples using the new Docker Containers.
This talk was given during Lucene Revolution 2017.
They say optimize is bad for you, they say you shouldn't do it, they say it will invalidate operating system caches and make your system suffer. This is all true, but is it true in all cases?
In this presentation we will look closer on what optimize or better called force merge does to your Solr search engine. You will learn what segments are, how they are built and how they are used by Lucene and Solr for searching. We will discuss real-life performance implications regarding Solr collections that have many segments on a single node and compare that to the Solr where the number of segments is moderate and low. We will see what we can do to tune the merging process to trade off indexing performance for better query performance and what pitfalls are there waiting for us. Finally, at the end of the talk we will discuss possibilities of running force merge to avoid system disruption and still benefit from query performance boost that single segment index provides.
Many companies continue to manaully create and manage their cloud infrastructure via web consoles. Documenting these procedures is challenging, especially since the interfaces are always evolving. Reviewing the changes is also difficult, and it often involves having a coworker watching over your shoulder. Rolling back a bad change requires deleting your current work and attemtping to manually re-create the old infrastructure from memory. Scaling or deploying the infrastructure to new environments also often involves manually re-creating it.
Hashicorp's Terraform allows for the management of infrastructure as code. While a growing number of groups have started to utilize this tool, most are only just beginning to scratch the surface of its potential. Yes, Terraform can be used to create and manage resources in AWS and other cloud providers. However, thanks to an ever growing number of providers, it can manage resources in many other popular cloud services. At Yelp, we use Terraform to manage our AWS resources, DNS records in NS1, CDN configuration in Fastly and Cloudflare, and our charts and dashboards in SignalFx.
This setup provides us with the ability to maintain our infrastructure as code in a version control system that can be put through standard code review flows. If we discover an issue, we can revert to an older, working commit and restore our infrastructure to that point in time. Documentation can include code snippets that can be easily copied/pasted in an error free manner. Finally, resources managed by one Terraform provider can benefit from and utilize information from resources managed by another provider. This means that launching a new AWS EC2 instance can automatically update the necessary DNS records in NS1, and then create a dashboard filled with customized charts designed to monitor the instance.
Search application development can start the moment you download Solr. As you ingest your data, or a sample thereof, you can easily see the search results in a familiar search user interface. Want to facet on a field? Done. Want to full-text search on a field? Change some configuration, restart, reindex, and voila! Done right, the iterative process of development and discovery will help you better match users to the data they need and deliver a quality search experience.
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
Peek behind the scenes to learn about Amazon ElastiCache's design and architecture. See common design patterns of our Memcached and Redis offerings and how customers have used them for in-memory operations and achieved improved latency and throughput for applications. During this session, we review best practices, design patterns, and anti-patterns related to Amazon ElastiCache.
Small wins in a small time with Apache SolrSourcesense
Slides used in a 2-hour long hands-on tutorial on Apache Solr at Dev8D UK: http://wiki.2011.dev8d.org/w/Session-WK16
"This is an introductory tutorial on Apache Solr, an open source enterprise search engine with a restful web interface."
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Add Powerful Full Text Search to Your Web App with Solr
1. Powerful Full-Text Search
with Solr
Yonik Seeley
yonik@apache.org
Web 2.0 Expo, Berlin
8 November 2007
download at
http://www.apache.org/~yonik
2. What is Lucene
• High performance, scalable, full-text
search library
• Focus: Indexing + Searching Documents
– “Document” is just a list of name+value pairs
• No crawlers or document parsing
• Flexible Text Analysis (tokenizers + token
filters)
• 100% Java, no dependencies, no config
files
3. What is Solr
• A full text search server based on Lucene
• XML/HTTP, JSON Interfaces
• Faceted Search (category counting)
• Flexible data schema to define types and fields
• Hit Highlighting
• Configurable Advanced Caching
• Index Replication
• Extensible Open Architecture, Plugins
• Web Administration Interface
• Written in Java5, deployable as a WAR
4. Basic App HTML
Indexer
Webapp
Document
super_name: Mr. Fantastic
Query Query Response
name: Reed Richards
(powers:agility) (matching docs)
category: superhero
powers: elasticity
http://solr/update http://solr/select
admin update select XML response writer
JSON response writer
Solr
Servlet Container
XML Update Handler Standard request handler
CSV Update Handler Custom request handler
Lucene
5. Indexing Data
HTTP POST to http://localhost:8983/solr/update
<add><doc>
<field name=“id”>05991</field>
<field name=“name”>Peter Parker</field>
<field name=“supername”>Spider-Man</field>
<field name=“category”>superhero</field>
<field name=“powers”>agility</field>
<field name=“powers”>spider-sense</field>
</doc></add>
6. Indexing CSV data
Iron Man, Tony Stark, superhero, powered armor | flight
Sandman, William Baker|Flint Marko, supervillain, sand transform
Wolverine,James Howlett|Logan, superhero, healing|adamantium
Magneto, Erik Lehnsherr, supervillain, magnetism|electricity
http://localhost:8983/solr/update/csv?
fieldnames=supername,name,category,powers
&separator=,
&f.name.split=true&f.name.separator=|
&f.powers.split=true&f.powers.separator=|
7. Data upload methods
URL=http://localhost:8983/solr/update/csv
• HTTP POST body (curl, HttpClient, etc)
curl $URL -H 'Content-type:text/plain;
charset=utf-8' --data-binary @info.csv
• Multi-part file upload (browsers)
• Request parameter
?stream.body=‘Cyclops, Scott Summers,…’
• Streaming from URL (must enable)
?stream.url=file://data/info.csv
8. Indexing with SolrJ
// Solr’s Java Client API… remote or embedded/local!
SolrServer server = new
CommonsHttpSolrServer(quot;http://localhost:8983/solrquot;);
SolrInputDocument doc = new SolrInputDocument();
doc.addField(quot;supernamequot;,quot;Daredevilquot;);
doc.addField(quot;namequot;,quot;Matt Murdockquot;);
doc.addField(“categoryquot;,“superheroquot;);
server.add(doc);
server.commit();
9. Deleting Documents
• Delete by Id, most efficient
<delete>
<id>05591</id>
<id>32552</id>
</delete>
• Delete by Query
<delete>
<query>category:supervillain</query>
</delete>
10. Commit
• <commit/> makes changes visible
– Triggers static cache warming in
solrconfig.xml
– Triggers autowarming from existing caches
• <optimize/> same as commit, merges all
index segments for faster searching
_0.fnm
_0.fdt
_0.fdx
_0.frq
Lucene Index Segments
_0.tis
_0.tii
_0.prx _1.fnm
_0.nrm _1.fdt
_1.fdx
_0_1.del […]
12. Response Format
• Add &wt=json for JSON formatted response
{“resultquot;: {quot;numFoundquot;:427, quot;startquot;:0,
quot;docsquot;: [
{“supername”:”Spider-Man”, “category”:”superhero”},
{“supername”:” Msytique”, “category”:” supervillain”}
]
}
• Also Python, Ruby, PHP, SerializedPHP, XSLT
13. Scoring
• Query results are sorted by score descending
• VSM – Vector Space Model
• tf – term frequency: numer of matching terms in field
• lengthNorm – number of tokens in field
• idf – inverse document frequency
• coord – coordination factor, number of matching
terms
• document boost
• query clause boost
http://lucene.apache.org/java/docs/scoring.html
17. DisMax Query Syntax
• Good for handling raw user queries
– Balanced quotes for phrase query
– ‘+’ for required, ‘-’ for prohibited
– Separates query terms from query structure
http://solr/select?qt=dismax
&q=super man // the user query
&qf=title^3 subject^2 body // field to query
&pf=title^2,body // fields to do phrase queries
&ps=100 // slop for those phrase q’s
&tie=.1 // multi-field match reward
&mm=2 // # of terms that should match
&bf=popularity // boost function
18. DisMax Query Form
• The expanded Lucene Query:
+( DisjunctionMaxQuery( title:super^3 |
subject:super^2 | body:super)
DisjunctionMaxQuery( title:man^3 |
subject:man^2 | body:man)
)
DisjunctionMaxQuery(title:”super man”~100^2
body:”super man”~100)
FunctionQuery(popularity)
• Tip: set up your own request handler with default parameters
to avoid clients having to specify them
19. Function Query
• Allows adding function of field value to score
– Boost recently added or popular documents
• Current parser only supports function notation
• Example: log(sum(popularity,1))
• sum, product, div, log, sqrt, abs, pow
• scale(x, target_min, target_max)
– calculates min & max of x across all docs
• map(x, min, max, target)
– useful for dealing with defaults
20. Boosted Query
• Score is multiplied instead of added
– New local params <!...> syntax added
&q=<!boost b=sqrt(popularity)>super man
• Parameter dereferencing in local params
&q=<!boost b=$boost v=$userq>
&boost=sqrt(popularity)
&userq=super man
24. copyField
• Copies one field to another at index time
• Usecase #1: Analyze same field different ways
– copy into a field with a different analyzer
– boost exact-case, exact-punctuation matches
– language translations, thesaurus, soundex
<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact”
stored=“false”/>
<copyField source=“title” dest=“title_exact”/>
• Usecase #2: Index multiple fields into single
searchable field
29. Filters
• Filters are restrictions in addition to the query
• Use in faceting to narrow the results
• Filters are cached separately for speed
1. User queries for memory, query sent to solr is
&q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
&q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
&q=memory&fq=inStock:true&fq=size:1GB
&fq=type:DDR2&…
31. MoreLikeThis
• Selects documents that are “similar” to the
documents matching the main query.
&q=id:6H500F0
&mlt=true&mlt.fl=name,cat,features
quot;moreLikeThisquot;:{
quot;6H500F0quot;:{quot;numFoundquot;:5,quot;startquot;:0,
quot;docs”: [
{quot;namequot;:quot;Apple 60 GB iPod with Video
Playback Blackquot;, quot;pricequot;:399.0,
quot;inStockquot;:true, quot;popularityquot;:10, […]
}, […]
]
[…]
32. High Availability Dynamic
HTML
Appservers Generation
HTTP search
Load Balancer requests
Solr Searchers
Index Replication
admin queries
updates
updates DB
Updater
admin terminal Solr Master