This is an intro to Sphinx and PHP. It will take you through the very basics of how Sphinx works, how you can set up an index, and using the mysql client to search your index. Then, it culminates in a quick little PHP script that builds a small search interface around your index. I will be posting the example code into my github account soon.
This presentation was given to the LV PHP meetup on August 5th.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Database Automation with MySQL Triggers and Event SchedulersAbdul Rahman Sherzad
This advanced training seminar on "Database Automation using MySQL Triggers and Event Schedulers" is dedicated to the Computer Science graduates and students of both public and private universities.
In this seminar we are going to look in depth at MySQL Triggers and Event Schedulers– powerful features supported by most popular commercial and open source relational database systems.
The Triggers are powerful tools for protecting the integrity of the data in the databases, logging and auditing of the changes on data, business logic, perform calculations, run further SQL commands, etc.
The Events are very useful to automate some database operations such as optimizing database tables, cleaning up logs, archiving data, or generate complex reports during off-peak time, etc.
The participants will learn about the true concept, implementation and application of MySQL Triggers and Event Schedulers with real life examples and scenarios.
They will also learn how to use the database triggers and event schedulers in many real cases to automate database tasks - such as optimizing database tables, cleaning up logs, archiving data, or generate complex reports during off-peak time.
This seminar is presented by Abdul Rahman Sherzad lecturer at Computer Science faculty of Herat University, and PhD Student at Technical University of Berlin, Germany at Hariwa Institute of Higher Education, Herat, Afghanistan.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Database Automation with MySQL Triggers and Event SchedulersAbdul Rahman Sherzad
This advanced training seminar on "Database Automation using MySQL Triggers and Event Schedulers" is dedicated to the Computer Science graduates and students of both public and private universities.
In this seminar we are going to look in depth at MySQL Triggers and Event Schedulers– powerful features supported by most popular commercial and open source relational database systems.
The Triggers are powerful tools for protecting the integrity of the data in the databases, logging and auditing of the changes on data, business logic, perform calculations, run further SQL commands, etc.
The Events are very useful to automate some database operations such as optimizing database tables, cleaning up logs, archiving data, or generate complex reports during off-peak time, etc.
The participants will learn about the true concept, implementation and application of MySQL Triggers and Event Schedulers with real life examples and scenarios.
They will also learn how to use the database triggers and event schedulers in many real cases to automate database tasks - such as optimizing database tables, cleaning up logs, archiving data, or generate complex reports during off-peak time.
This seminar is presented by Abdul Rahman Sherzad lecturer at Computer Science faculty of Herat University, and PhD Student at Technical University of Berlin, Germany at Hariwa Institute of Higher Education, Herat, Afghanistan.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Sparklens: Understanding the Scalability Limits of Spark Applications with R...Databricks
One of the common requests we receive from customers (at Qubole) is debugging slow spark application. Usually this process is done with trial and error, which takes time and requires running clusters beyond normal usage (read wasted resources). Moreover, it doesn’t tell us where to looks for further improvements. We at Qubole are looking into making this process more self-serve. Towards this goal we have built a tool (OSS https://github.com/qubole/sparklens) based on spark event listener framework.
From a single run of the application, Sparklens provides insights about scalability limits of given spark application. In this talk we will cover what Sparklens does and theory behind Sparklens. We will talk about how structure of spark application puts important constraints on its scalability. How can we find these structural constraints and how to use these constraints as a guide in solving performance and scalability problems of spark applications.
This talk will help audience in answering the following questions about their spark applications: 1) Will their application run faster with more executors? 2) How will cluster utilization change as number of executors change? 3) What is the absolute minimum time this application will take even if we give it infinite executors? 4) What is the expected wall clock time for the application when we fix the most important structural limits of these application? Sparklens makes the ROI of additional executor extremely obvious for a given application and needs just a single run of the application to determine how application with behave with different executor counts. Specifically, it will help managers take the correct side of the tradeoff between spending developer time optimising applications vs spending money on compute bills.
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
Nobody complains that the database is too fast. But when things slow down, the complaints come quickly. The two most popular approaches to speeding up queries are indexes and histograms. But there are so many options and types on indexes that it can get confusing. Histograms are fairly new to MySQL but they do not work for all types of data. This talk covers how indexes and histograms work and show you how to test just how effective they are so you can measure the performance of your queries.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
There are several exciting and long-awaited features released from MongoDB 4.0. He will focus on the prime features, the kind of problem it solves, and the best practices for deploying replica sets.
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
Tutorial delivered at Percona Live London 2014, where we explore new features and techniques for faster queries with MySQL 5.6 and 5.7 and MariaDB 10, including the newest options in MySQL 5.7.5 and MariaDB 10.1.
Download here the virtual machine with the example database: http://dbahire.com/pluk14
Update: WordPress has a workaround for STRICT mode: https://core.trac.wordpress.org/ticket/26847
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
MongoDB 3.0 introduced the concept of different storage engine. The new engine known as WiredTiger introduces document level MVCC locking, compression and a choice between Btree or LSM indexes. In this talk you will learn about the storage engine architecture and specifically WiredTiger, and how to tune and monitor it for best performance.
MongoDB 3.0 представил новый концепт движков хранения. Новый движок известен как WiredTiger и предоставляет новый уровень документов MVCC фикс, компрессию и выбор между Btree или индексами LSM. В этом докладе вы поймете, как тюнить и мониторить архитектуры движка базы данных, а точнее WiredTiger для получения максимальной производительности.
In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
Performance Update: When Apache ORC Met Apache SparkDataWorks Summit
Apache Spark 1.4 introduced support for Apache ORC. However, initially it did not take advantage of the full power of ORC. For instance, it was slow because ORC vectorization was not used and push-down predicate wa s also not supported on DATE types. Recently the Apache Spark community has started to use the latest Apache ORC which include new enhancements to address these limitations. In this talk, we show the result of integrating the latest Apache ORC and Apache Spark. We will also review the latest enhancements and roadmap.
Speakers:
Owen O'Malley, Co-founder & Technical Fellow, Hortonworks
Dongjoon Hyun, Staff Software Engineer, Hortonworks
Sparklens: Understanding the Scalability Limits of Spark Applications with R...Databricks
One of the common requests we receive from customers (at Qubole) is debugging slow spark application. Usually this process is done with trial and error, which takes time and requires running clusters beyond normal usage (read wasted resources). Moreover, it doesn’t tell us where to looks for further improvements. We at Qubole are looking into making this process more self-serve. Towards this goal we have built a tool (OSS https://github.com/qubole/sparklens) based on spark event listener framework.
From a single run of the application, Sparklens provides insights about scalability limits of given spark application. In this talk we will cover what Sparklens does and theory behind Sparklens. We will talk about how structure of spark application puts important constraints on its scalability. How can we find these structural constraints and how to use these constraints as a guide in solving performance and scalability problems of spark applications.
This talk will help audience in answering the following questions about their spark applications: 1) Will their application run faster with more executors? 2) How will cluster utilization change as number of executors change? 3) What is the absolute minimum time this application will take even if we give it infinite executors? 4) What is the expected wall clock time for the application when we fix the most important structural limits of these application? Sparklens makes the ROI of additional executor extremely obvious for a given application and needs just a single run of the application to determine how application with behave with different executor counts. Specifically, it will help managers take the correct side of the tradeoff between spending developer time optimising applications vs spending money on compute bills.
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
Nobody complains that the database is too fast. But when things slow down, the complaints come quickly. The two most popular approaches to speeding up queries are indexes and histograms. But there are so many options and types on indexes that it can get confusing. Histograms are fairly new to MySQL but they do not work for all types of data. This talk covers how indexes and histograms work and show you how to test just how effective they are so you can measure the performance of your queries.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
There are several exciting and long-awaited features released from MongoDB 4.0. He will focus on the prime features, the kind of problem it solves, and the best practices for deploying replica sets.
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
Tutorial delivered at Percona Live London 2014, where we explore new features and techniques for faster queries with MySQL 5.6 and 5.7 and MariaDB 10, including the newest options in MySQL 5.7.5 and MariaDB 10.1.
Download here the virtual machine with the example database: http://dbahire.com/pluk14
Update: WordPress has a workaround for STRICT mode: https://core.trac.wordpress.org/ticket/26847
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
MongoDB 3.0 introduced the concept of different storage engine. The new engine known as WiredTiger introduces document level MVCC locking, compression and a choice between Btree or LSM indexes. In this talk you will learn about the storage engine architecture and specifically WiredTiger, and how to tune and monitor it for best performance.
MongoDB 3.0 представил новый концепт движков хранения. Новый движок известен как WiredTiger и предоставляет новый уровень документов MVCC фикс, компрессию и выбор между Btree или индексами LSM. В этом докладе вы поймете, как тюнить и мониторить архитектуры движка базы данных, а точнее WiredTiger для получения максимальной производительности.
In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
Performance Update: When Apache ORC Met Apache SparkDataWorks Summit
Apache Spark 1.4 introduced support for Apache ORC. However, initially it did not take advantage of the full power of ORC. For instance, it was slow because ORC vectorization was not used and push-down predicate wa s also not supported on DATE types. Recently the Apache Spark community has started to use the latest Apache ORC which include new enhancements to address these limitations. In this talk, we show the result of integrating the latest Apache ORC and Apache Spark. We will also review the latest enhancements and roadmap.
Speakers:
Owen O'Malley, Co-founder & Technical Fellow, Hortonworks
Dongjoon Hyun, Staff Software Engineer, Hortonworks
Tips for Tuning Solr Search: No Coding RequiredAcquia
Helping online visitors easily find what they’re looking for is key to a website’s success. In this webinar, you’ll learn how to improve search in ways that don’t require any coding or code changes. We’ll show you easy modifications to tune up the relevancy to more advanced topics, such as altering the display or configuring advanced facets.
Acquia’s Senior Search Engineer, Nick Veenhof , will guide you step by step through improving the search functionality of a website, using an in-house version of an actual conference site.
Some of the search topics we'll demonstrate include:
• Clean faceted URL’s
• Adding sliders, checkboxes, sorting and more to your facets
• Complete customization of your search displays using Display Suite
• Tuning relevancy by using Solr optimizations
This webinar will make use of the Facet API module suite in combination with the Apache Solr Search Integration module suite. We'll also use some generic modules to improve the search results that are independent of the search technology that is used. All of the examples shown are fully supported by Acquia Search.
How to Build Mobile Apps Fast with The Marketing App Cloud by ProscapeProscape
Marketers use The Marketing App Cloud to build powerful mobile apps in just hours. No templates. No code. No testing. Flip through to learn how we empower agencies and brands to build, deploy, and learn from their custom marketing apps without the pain of mobile app development.
Tercera parte del taller #Sprint de #ConfianzaCreativa, sobre Trabajos-por-Hacer (#TPH) o Jobs-to-be-Done (#JTBD). Productos que la gente ame, Trabajos-por-Hacer, Cómo definir un TPH, ejemplos de TPH.
Para saber más de Trabajos por Hacer, lea este post:
http://www.p3-ventures.biz/#!Cómo-lograr-Crecimiento-Empresarial-cambiando-la-manera-en-que-segmenta-a-sus-clientes/cda0/56b514590cf2dc1600e59383
Nuevo folleto del master marketing politico UCV curso 2015-16Silvia Moya Rozalén
Nuevo folleto del Máster Oficial en marketing político y comunicación institucional de la Universidad Católica de Valencia San Vicente Mártir para el curso 2015-2016
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
A comparison of different solutions for full-text search in web applications using PostgreSQL and other technology. Presented at the PostgreSQL Conference West, in Seattle, October 2009.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
An Open Talk at DeveloperWeek Austin 2017 by Kimberly Wilkins (@dba_denizen), Principal Engineer - Databases at ObjectRocket. Featuring new use cases like Bitcoin, AI, IoT, and all the cool things.
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
در این اسلاید به مباحث زیر می پردازیم:
مقدمات پایگاه داده های غیر اس.کیو.ال، مبانی جستجوگرها
سپس معرفی ابزار جستجوی الاستیکی، کاربردها، معماری کلی، مقایسه با ابزارهای مشابه
افزودن تحلیلگر متن و در نهایت لینک آن با دات نت
ا
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. What is Sphinx?
• A full-text search engine
• Quickly get high quality (relevant) results
• Designed to integrate well with SQL RDBMS
• Can work with any data source
• Can be queried using either an API or SQL
3. How do I know anything
about Sphinx?
• Manager of Software Architecture for
Slickdeals.net
• Alexa top 150 site (in the US)
• Have been working at improving our Sphinx
search engine for the last 2 months or so.
• Over 7 Million searches a month directly through
the interface, lots more happen indirectly.
4. When should I use Sphinx?
• Site / Product / Document searches
• Auto-suggest / Auto-Correct functionality
• Finding relevant and related items
5. Simple Architecture
• Often, search is offloaded
straight to the database
• Search goes to the backend
which performs queries on the
database
• Obviously very easy to
implement
6. Simple Architecture
• Simple “starts with” searches
on indexed fields can
sometimes work: `city` LIKE
‘Las%’
• Anything else will lock your
database for writes with
MyISAM.
• MySQL is not a great or
flexible full text engine
• It can sometimes be adequate
7. Sphinx Architecture
• Searchd is responsible for
receiving requests from
clients and executing the
searches against the sphinx
index.
• Indexer is responsible for
getting data into the sphinx
index.
• This separation allows
indexing and searching to be
scaled separately.
8. Sphinx Architecture
• Searchd has a binary protocol
for which there are several
clients available in multiple
languages.
• Searchd is also binary
compatible with MySQL’s
protocol since mysql 4.1
• Searchd is a daemon that
runs on your search servers
9. Sphinx Architecture
• Indexer is a shell program that
you can execute to build any
number of indexes.
• Can handle index rotation for
live indexing
10. Not So Quick Side Note
MySQL IS SLOWWWWWWWWWWWWW
(at text matches)
13. Sphinx Concepts
• Sphinx Indexes “Documents”
• Each document has a unique unsigned, non-
zero integer ID (either 32 bit or 64 bit space)
• Each document has one or more fields
• Each document has zero or more attributes
14. Indexes / Sources
• Sphinx indexes are created from one or more
sources.
• The source can be a database, xml, or tsv
stream.
• You can use multiple sources
• This is useful for maintaining updated indexes
• Also used to implement a sphinx cluster
15. Sphinx Fields
• Fields are what the full text index is comprised of.
• When searching you can search against any number
of fields.
• You can assign different relevancy weights to different
fields.
• The original value of a field is never stored by Sphinx.
• You should always have at least one.
16. Sphinx Attributes
• data that helps further describe the item being
indexed
• Can be returned as a part of the search
• Useful for filtering and sorting results
• These are not a part of the full text index.
17. MySQL Full Text Search
• You can get away with MyISAM tables or as of
version 5.6 InnoDB.
• You don’t care about morphology (think plurals)
• You don’t need anything but the most basic of
search operators
18. Creating An Index
• We are going to add an index that sources a
mysql database.
• The data being sourced is a list of the titles of
wikipedia posts.
20. Indexer Configuration
• We are going to be peaking into a sphinx
configuration file now.
• You can rebuild the config file by concatenating
each section into a single file.
• On my VM this file is located in /usr/local/etc/
sphinx.conf
23. Connection information
• Ideally, you should create a
separate account for sphinx
• You can also connect via unix
socket
• I didn’t specify it here, but you
can also add a port.
25. Source Index
• The index query MUST return
the id field as the first column
• Remember, the id needs to be
a unique, unsigned 64 bit (or
less number)
• The query must be on a single
line. Unless you escape new
lines with back slashes.
• Notice that we converted the
timestamp into a unix
timestamp. That is important.
27. Source Fields
• The first column in the query is
always the ID.
• You specify any columns that
are attributes.
• Remember, attributes are
stored in the index as fields
that can be used to filter and
sort by.
• Any field besides the id that is
not specified as an attribute, is
assumed to be a text field (title)
29. Index Definition
• An Index includes one or
more sources.
• Each source gets it’s own
“source” line
• Multiple sources must all
define the same fields and
attributes.
• The ids need to be unique
across resources
30. Index Definition
• path is not actually a path, it’s
a filename with no extension.
• docinfo dictates if attributes
are stored in the index or
outside of the index.
• dict is not really important
now. Used to be either crc or
keywords. Now crc is
deprecated.
• min_word_len is the minimum
length of words to index
40. Querying Indexes
• Default limit of 20 rows
• Notice the text fields are not
returned…
• They would be if we made
them attributes
(sql_field_string)
41. Querying Indexes
• The magic function in
SphinxQL is match()
• match() performs a full text
search against the entire
index…usually
• The ‘@field’ operator can
isolate which field is searched
on.
42. Querying Indexes
• You can query against
attributes
• You can sort results
• You can use the weight()
function to determine
relevancy.
43. Querying Indexes
• The 25387283 title was more
relevant because it matched
on the term “testing”
44. Getting PHP into the mix
• All we need? PDO.
• We will build a basic search page
• Accepts a query, displays up to 100 matching
results by relevancy with the matching keywords
highlighted.
50. Cool things we would talk about
if I had like…3 more hours
• Auto-suggest, Auto-correct
• More on lemmatization and stemming
• Distributed Sphinx Clustering
• Delta indexes
• Real Time Indexes
• The plethora of operators you can use
• Ranged Queries
• ………
51. Additional Information
• The sphinx documentation is actually pretty
great
• http://sphinxsearch.com/docs/
• Slides are already on Slideshare
• Will link them to the meet up shortly