Presentation held at Oslo Enterprise MeetUp in May, pitched towards an audience who come from the FAST ESP side and have some existing FAST knowledge. Check out one of my other presentations if you're most familiar with Lucene/Solr.
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
Presented during Lucene EuroCon 2010 in Prague. This presentation assumes no prior experience with FAST ESP, but some idea of what Solr/Lucene is. It gives you some hints on what to expect when migrating.
SQL Performance Improvements At a Glance in Apache Spark 3.0Kazuaki Ishizaki
This is a presentation deck for Spark AI Summit 2020 at
https://databricks.com/session_na20/sql-performance-improvements-at-a-glance-in-apache-spark-3-0
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
Presented during Lucene EuroCon 2010 in Prague. This presentation assumes no prior experience with FAST ESP, but some idea of what Solr/Lucene is. It gives you some hints on what to expect when migrating.
SQL Performance Improvements At a Glance in Apache Spark 3.0Kazuaki Ishizaki
This is a presentation deck for Spark AI Summit 2020 at
https://databricks.com/session_na20/sql-performance-improvements-at-a-glance-in-apache-spark-3-0
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2so9ZDk
This CloudxLab Introduction to Structured Streaming tutorial helps you to understand Structured Streaming in detail. Below are the topics covered in this tutorial:
1) Structured Streaming - Introduction
2) Word Count using Structured Streaming
3) Programming Model
4) Output Modes - Complete, Append and Update
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...frank2
Binary obfuscation is a mysterious ritual employed by malware authors and software vendors alike that no one really seems to talk about. It's almost like a secret society. Interestingly, you don't have to write a program to obfuscate the binary-- you can also write high-level code that obfuscates at compile-time, rather than afterward.
Slides from my workshop at Hack.LU 2010 in Luxembourg. This workshop introduced the basic concepts of Return Oriented Programming with some hands-on exercises.
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
Scaling search platforms for serving hundreds of millions of documents with low latency and high throughput workloads at an optimized cost is an extremely hard problem. BloomReach has implemented Sc2, which is an elastic Solr infrastructure for Big Data applications, supporting heterogeneous workloads and hosted in the cloud. It dynamically grows/shrinks search servers to provide application and pipeline level isolation, NRT search and indexing, latency guarantees, and application-specific performance tuning. In addition, it provides various high availability features such as differential real-time streaming, disaster recovery, context aware replication, and automatic shard and replica rebalancing, all with a zero downtime guarantee for all consumers. This infrastructure currently serves hundreds of millions of documents in millisecond response times with a load ranging in the order of 200-300K QPS.
This presentation will describe an innovate implementation of scaling Solr in an elastic fashion. It will review the architecture and take a deep dive into how each of these components interact to make the infrastructure truly elastic, real time, and robust while serving latency needs.
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
Presented by Stephane Gamard, Chief Technology Officer, Searchbox
In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shardes, and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster.
We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.
Presentation slide for "In-Memory Storage Evolution in Apache Spark" at Spark+AI Summit 2019
https://databricks.com/session/in-memory-storage-evolution-in-apache-spark
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2so9ZDk
This CloudxLab Introduction to Structured Streaming tutorial helps you to understand Structured Streaming in detail. Below are the topics covered in this tutorial:
1) Structured Streaming - Introduction
2) Word Count using Structured Streaming
3) Programming Model
4) Output Modes - Complete, Append and Update
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...frank2
Binary obfuscation is a mysterious ritual employed by malware authors and software vendors alike that no one really seems to talk about. It's almost like a secret society. Interestingly, you don't have to write a program to obfuscate the binary-- you can also write high-level code that obfuscates at compile-time, rather than afterward.
Slides from my workshop at Hack.LU 2010 in Luxembourg. This workshop introduced the basic concepts of Return Oriented Programming with some hands-on exercises.
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
Scaling search platforms for serving hundreds of millions of documents with low latency and high throughput workloads at an optimized cost is an extremely hard problem. BloomReach has implemented Sc2, which is an elastic Solr infrastructure for Big Data applications, supporting heterogeneous workloads and hosted in the cloud. It dynamically grows/shrinks search servers to provide application and pipeline level isolation, NRT search and indexing, latency guarantees, and application-specific performance tuning. In addition, it provides various high availability features such as differential real-time streaming, disaster recovery, context aware replication, and automatic shard and replica rebalancing, all with a zero downtime guarantee for all consumers. This infrastructure currently serves hundreds of millions of documents in millisecond response times with a load ranging in the order of 200-300K QPS.
This presentation will describe an innovate implementation of scaling Solr in an elastic fashion. It will review the architecture and take a deep dive into how each of these components interact to make the infrastructure truly elastic, real time, and robust while serving latency needs.
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
Presented by Stephane Gamard, Chief Technology Officer, Searchbox
In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shardes, and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster.
We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.
Presentation slide for "In-Memory Storage Evolution in Apache Spark" at Spark+AI Summit 2019
https://databricks.com/session/in-memory-storage-evolution-in-apache-spark
BeeCon 2017 (Zaragoza)
https://www.youtube.com/watch?v=6pxYndS-x9A
http://beecon.buzz
A lot of effort has been put into creating a useful set of RESTful APIs in the Alfresco Content Services 5.2 release and beyond. In this talk we'll cover those new APIs and have a sneak peek at what's coming next, including Process, Governance and Search Services.
An overview of the new and enhanced APIs will be discussed and some of the key endpoints demonstrated via Postman. By the time you leave you should have enough knowledge to consume the APIs in your favourite programming language and create a simple client.
These APIs will be the foundation of all new clients developed by Alfresco so this is a must attend session!
What are the main characteristics of E Commerce search and why Apache Solr is one of the best search engines to power ecommerce websites.
Characteristics of E-Commerce Search
Solr: History
Solr: A Brief
Why Solr?
Solr System
Features of Solr
Users
Resources
http://www.thepcwizard.in/p/about-me-and-blog.html
* Open source search with Solr/Lucene gives you the power to turn a wide range of information into fast, useful, relevant results!
* LucidWorks for Solr gives you a tested, release-stable certified distribution of open source search with enhanced tools and installation for building search apps quickly and reliably.
http://www.lucidimagination.com/How-We-Can-Help/webinar-from-search-to-found
Delphi ORM SOA MVC SQL NoSQL JSON REST mORMotArnaud Bouchez
Slides published for BeDelphi 2014 Event.
Create high performance Client Server ORM SOA REST MVC applications using Open Source Synopse mORMot framework and Delphi. Publish any SQL or NoSQL database content over JSON or XML: SQLite3, PostgreSQL, Oracle, MSSQL, FireBird, MongoDB. Define RESTful services using interfaces. Create MVC web applications, using Mustache templates. Running under Windows or Linux, with VCL/FMX clients on Mac OSX, Android or iOS/iPhone/iPad, or AJAX/PhoneGap.
Java
This slide is a shot overview to Java from start up to now. Where we were? Where we are? Where we’re going? Know how. Was hold in Lindholmen October, 2013 http://www.lindholmen.se/en/node/35950
This presentation was given by Ishad M. Barot, Client Technical Professional, India(West) during Impact India 2012 on the 1st of June at Mumbai. It focuses on how businesses can save time and efforts using the WebSphere Application Server. WAS is much more than just being Open Source
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
The quest for the perfect cross-platform solution has been like the quest for the Holy Grail. It’s been going on a long time, there are a myriad of perceived benefits, and every time someone claims to have found it, it’s never the right one. Many people ask, “Should I go with a cross-platform solution, or a native solution?” but the reality is the quest is bringing us closer to a solution where there isn’t a meaningful difference.
React Native wasn’t the first to show a solution could be both cross-platform and native, but it has certainly convinced a lot of people. As many of those early converts are discovering the limitations, they are beginning to fall back into either-or thinking. Maybe they just have the wrong assumptions.
Kotlin Multiplatform makes some new assumptions and, although it wasn’t the first to do so, is gaining in popularity very quickly. Is Kotlin Multiplatform the holy grail of cross-platform? Probably not. But it does bring cross-platform and native closer than ever before.
----
Presented at https://newyork2019.theleaddeveloper.com/
Video: https://www.youtube.com/watch?v=sA_JIqqj9js
Thank you https://touchlab.co/ for supporting me and the future of mobile.
Similar to Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl (20)
My talk at Lucene/Solr Revolution 2017, Las Vegas
The improved plugin system being proposed in this talk utilizes PF4J to add bundle packaging (zip/jar), plugin discovery (repositories), one-line install/upgrade and automatic version compatibility checks. Think of it as Homebrew or Apt-Get for Solr :) The hope is that this will encourage hundreds of new plugins being created and thus give Solr developers a sense of community and a new “stage” to perform on.
Enterprise search can grow big, really big! And growing. Tens, yes hundreds of servers may be involved, locally or in the cloud. Managing this has been complex and time consuming - until now :)
SolrCloud to the rescue
Using the world's most popular Open Source search engine, Apache Solr™, we will show you how the new upcoming version 4.0 makes scaling search in the cloud really simple and robust. A new feature called SolrCloud adds centralized configuration, distributed indexing & searching, automatic failover, recovery and leader election. Scaling is now as simple as adding a new server to your cluster and it will find its role where it is most needed and start serving searches.
A talk about the (hidden) document processing capability built right into Apache Solr. We show you what it its, how to use it, how to write your own plugins and suggest some future improvements.
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
Foredrag på GoOpen, Oslo, 2011 (Norwegian language)
NHST Media Group lager nettsidene for bl.a. Dagens Næringsliv, Dagens IT og en rekke engelskspråklige bransjeaviser. Systemutvikler Hans Jørgen Hoel og søke-arkitekt Jan Høydahl forteller om prosessen etter at det ble besluttet å erstatte søkeløsningen fra FAST med fri programvare Apache Solr. Vi vil forsøke å besvare bl.a.: Hvilke utfordringer møtte vi som følge av forskjeller i de to plattformene? Hvorfor bygde vi vårt eget søkerammeverk? Har det nye søket innfridd forventningene?
Se også www.goopen.no, www.cominvent.com og www.nhst.no og Twitter hashtag #GoOpen
Frokostseminar mai 2010 solr open source cominvent asCominvent AS
Slides fra frokostseminar om Open Souce søk med Apache Lucene/Solr i Oslo mai 2010. Dette var et arrangement av Cominvent AS og FindWise AB.
Presentation is in Norwegian language
Presentation of Norwegian based search consulting company Cominvent AS, focusing on Apache Solr/Lucene/ElasticSearch and other enterprise search and big data technology.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
1. cominvent as
Enterprise Search Specialists
Migrating FAST to Solr
By Jan Høydahl
Oslo Enterprise Search MeetUp May 2010
cominvent as
2. Jan Høydahl
● IT architect - search,
telecom, mobile
● Helped build FAST's Global
Services as first engineer
● Founder of Cominvent AS
● Search consultant 10 years
cominvent as
4. Consulting
– Cominvent delivers independent search consulting
– Focus on Apache Lucene/Solr & Microsoft FAST ESP
Idea –> architecture –> implementation
cominvent as
5. Commercial Support (Solr/Lucene)
– When community & mailing list support is not enough..
– Paid support agreement for Apache Solr/Lucene
– In cooperation with Lucid Imagination
– Read more: http://www.cominvent.com/support/
cominvent as
6. Training
– Cominvent AS delivers training public and on-site
– Certified Solr Training Partner for Lucid Imagination
– Certified FAST ESP Training Partner
– Read more: http://www.cominvent.com/training/
cominvent as
Photo: fluidpowerzone.com
14. Apache Solr - characteristics
Search server
(Commercially friendly)
cominvent as
15. Apache Solr - characteristics
Modular Community
Contributions & patches
Light weight
cominvent as
16. Solr-user community growth
Solr-user growth
1600
1400
1200
1000
Messages
800
Column B
600
400
200
0
2006 Mar 2006 Jul 2006 Nov 2007 Mar 2007 Jul 2007 Nov 2008 Mar 2008 Jul 2008 Nov 2009 Apr 2009 Aug 2009 Dec
2006 Jan 2006 May 2006 Sep 2007 Jan 2007 May 2007 Sep 2008 Jan 2008 May 2008 Sep 2009 Feb 2009 Jun 2009 Oct 2010 Feb
cominvent as Month
17. Lucene/Solr deployments
– More: http://wiki.apache.org/solr/PublicServers
cominvent as
Thanks to Lucid Imagination for logo collection
23. FAST ESP – characteristics & key strengths
Security
Connectors
cominvent as
24. FAST ESP – characteristics & key strengths
cominvent as
25. FAST ESP – characteristics & key strengths
– Very strong document processing framework
Format Language Linguistic
Conversion Detection Normalization Entities
Custom
Taxonomy Sentiment Ontology
Plug-in
PARIS (Reuters) - Venus Williams raced into the second
round of the $11.25 million French Open Monday,
Search Alert brushing aside Bianka Lamade, 6-3, 6-3, in 65 minutes.
The Wimbledon and U.S. Open champion, seeded second,
breezed past the German on a blustery center court to
become the first seed to advance at Roland Garros.
"I love being here, I love the French Open and more than
anything I'd love to do well here," the American said.
A first round loser last year, Williams is hoping to progress
cominvent as beyond the quarter-finals for the first time in her career.
28. Migration objectives
– Possible objectives include:
• Lower maintenance cost
• Deeper in-house competency
• Less dependent on external consultants
• Ownership and visibility of source code
• Shorter time to market for new features
• Bugs fixed faster – or even fix ourselves
• Larger community, mailing lists that work!
• More choice in external consultants
• Contribute back to Open Source
• Lower HW footprint
cominvent as
29. Migration steps
– Knowledge gathering & Training
– Review current features & arch
• Want to keep all features? Add new?
– Migration areas:
• Index profile
• Content
• Feeding
• Document Processing
• Querying
• Search middleware?
• Admin & Operational
– What to do in Application space vs Search space?
cominvent as
30. Feature comparison ESP – Solr (similarities)
Feature ESP Solr
Full-text, boolean, range search, Yes Yes
sorting, sub-second, facets, did-you-
mean, synonyms, faceting
Scaling for QPS Add rows Add rows
Scaling for document volume Add columns Add shards
Synonyms Index/query side Index/query side
GEO search Yes Yes (1.5)
Boolean query language Yes (FQL) Yes (Lucene or
(e)DisMax)
APIs HTTP, Java, .NET, HTTP, Java, .NET,
C++, PHP Ruby, Python, PHP,
Perl, JS
cominvent as
31. Feature comparison ESP – Solr (differences)
Feature ESP Solr
Admin server Yes No (coming 1.5)
Processes Many (C++, Java, One WAR in Java
Python) app-server, 100%
Java
Navigators / Facets Index-time Query-time
Did-you-mean Dictionary based Dictionary or
index based
Feeding API only HTTP POST or API
Document processing Pipeline (py) Simple pipeline
(Java, JS, Groovy,
Jython, JRuby..)
Multi field querying Composite fields DisMax handler
cominvent as
32. Feature comparison ESP – Solr (differences)
Feature ESP Solr
Relevancy tuning Rank profiles, term Dynamic function
boosting queries and boost
functions
XRANK XRANK operator Function Queries
Freshness boost Freshness in rank Function Queries
profile
Boost GEO distance Rank profile and Function Queries
special
Major schema or software updates Cold update, use Stage new content
stage environment into new Solr core
Pluggability Docprocs, QT/RP Everything :)
(limited), clients Request Handlers,
Query Parsers,
Docprocs, Rank,
Spell, tokenizer++
cominvent as
33. Feature comparison ESP – Solr (differences)
Feature ESP Solr
Lemmatization Can be licensed Can be licensed
for many from 3rd party
languages
Query syntax and(a:foo, b:bar) a:foo OR b:bar
i:range(0, 100) I:[0 TO 100]
d:range(2000-01- d:[2000-01-
01T00:00:00, 01T00:00:00Z TO
2010-03- NOW]
03T12:00:00)
Query params query= q=
offset= start=
hits= rows=
spell=1 spellcheck=true
What fields to return view=viewname fl=title,price,body...
cominvent as
34. Feature comparison ESP – Solr (differences)
Feature ESP Solr
Search XML hierarchy Yes, scope search No
Reports Built in analytics Use 3rd party log
analysis such as
Splunk.com
cominvent as
35. Your existing FAST system - overview
Your web-app
Search middleware?
cominvent as
Graphics diagram: www.microsoft.com
36. Migrating index profile
– ESP index profile -> Solr schema.xml
– Setup field types, use defaults or create your own
– Setup the static fields. ESP:
– Solr equivalent:
– No need for generic*, use dynamic fields:
cominvent as
37. Migrating index profile
– Composite fields?
• Solr can use <copyField> to copy multiple fields into
one, e.g. as we did to map many attributes into one
field
• However, to achieve ranking with different boost of
each field, Solr does not need composite field. Use
DisMax query handler instead. Very powerful!
– No need to edit schema to add new fields. Using
dynamic fields, it is easy to e.g. Introduce a color facet
for cars or a Mpixels facet for digital cameras
cominvent as
38. DisMax query example
– This Solr query can replace use of composite-field
• qt=dismax
• q=oslo
• qf=title^0.7 highpriorityfields^1.5
mediumpriorityfields^0.6 lowpriorityfields^0.2
recallfields^0.0 body^0.0
• bf=recip(rord(creationDate),1,1000,1000)
cominvent as
39. Migrating content
– If using FAST ContentAPI to push programatically
• Use Solr's clients (Java, .NET, Ruby, Python, PHP...)
– If feeding FastXML using FileTraverser
• Feed as Solr XML using HTTP POST or a POST client
– If you feed custom XML with XMLMapper
• Have a look at DIH's import and mapping features
cominvent as
40. Push Feeding example
– Feed XML using HTTP POST:
• curl http://localhost:8080/solr/update?commit=true
-H "Content-Type: text/xml"
--data-binary @mydoc.xml
– Ruby example:
• >gem sources -a http://gemcutter.org
>sudo gem install rsolr
require 'rsolr'
solr = RSolr.connect :url=>'http://localhost:8080'
documents = [{:id=>1, :price=>1.00},
{:id=>2, :price=>10.50}]
solr.add documents
solr.commit
cominvent as
42. Querying examples
– http://localhost:8080/solr/select?q=car&fl=id,title
– Ruby
• res=solr.select :q=>'roses', :fq=>['red','white']
res['response']['docs'].each do |doc|
puts doc['title']
end
cominvent as
43. Migrating document processing
– Solr lacks a sophisticated pipeline with entity
extraction etc. Alternatives:
• Do extraction in Application space (Ruby)
• Write own stage in Solr pipeline for simple cases
• Integrate to do more advanced stuff
– Matchers/extractors
• LingPipe NamedEntityExtractor inside of OpenPipeline
– Synonyms:
• Use Solr's synonym handling index/query side
– Custom stages:
• Write a Solr UpdateProcessor (in Java, Jython etc)
– Got a LOT of custom FAST docproc stages?
• Have a look at SESAT's PY ProcServer for Solr (GPL)
cominvent as
44. Migrating linguistics (lemmatization)
– Solr ships with Stemming instead of Lemmatization
– Stemming has limitations
• Biler, bilen, bilene -> bil
BUT
• Bøker, bøkene -> bøk; boka, bok -> bok
– Kstem better. Free with LucidWorks for Solr
– If you need singular/plural handling only
• Free dictionaries? Check lucene-hunspell
– Lemmatization can be licensed from 3rd party
such as Basistech, who also has language
identification & entity extraction
– Language identification also from Sematext
cominvent as
45. Basistech Rosette for Lucene
– High-end linguistics capabilities for
19 languages
– Language Identification
– Segmentation and tokenization
– Lemmatization
– Noun decompounding
– Part-of-speech tagging
– Entity extraction
– Easily integrated with Lucene/Solr
– More: http://www.basistech.com/lucene/
cominvent as
46. Migrating search middleware
– Using FAST Unity?
• Consider migrating middleware logic such as external
source querying and federation to SESAT (AGPL)
– Using Comperio Front?
• Ask Comperio for Solr engine support
• Or migrate custom Q&R formats
– Or is plain Solr enough?
• Solr has built-in support for shards
• A shard query will query multiple shards
and merge the results into one
• Add custom processing as Query
Components in Solr
• Check contrib & patches!
cominvent as
47. Migrating Front ends
– Using a middleware with Solr support? Lucky you!
– If not, consider introducing one now. Look at (Java):
– If you decide to migrate from FAST Java/.NET APIs
• Choose SolrJ or SolrNET
• Query language differences. &fq= instead of filter()
• Solr facets do not require sessions/state as FAST's
– Migrate fast's «views» into named ReqHandler configs
– Multi lingual: Need to handle title_no, title_en etc... :(
cominvent as
48. Migrating Web Crawler
– Solr has no built-in web crawler
• Instead you can choose from several integrations
– The Apache Nutch crawler
• Proven with hundreds of millions of pages
• http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
– Apache Droids
• Still an incubator, but aims at becoming a full crawler
• http://incubator.apache.org/droids/
– Heritix + Solr (example in Solr1.4 book)
– OpenPipeline has a (very) simple crawler
– Lucene Connectors Framework
• Preparing crawler support
cominvent as
49. Migrating Connectors
– Solr handles these sources internally through DIH:
• Database, RSS, Web-services, Local filesystem
– Additionally throgh Lucene Connectors Framework:
•
• EMC Documentum, FileNet, JDBC, LiveLink, Patriarch
(Memex), Meridio, SharePoint, RSS
• New connectors should be written for LCF
– Another option:
•
• Sharepoint, IMAP, Documentum, Vignette, Filesystem
cominvent as
50. Operations
– Solr has no admin-server (coming in 1.5)
– Possible to run multiple Tomcat on same server
– Multiple cores in same Tomcat – easier migration
– No built-in query reports, use 3rd party tools
– No built-in monitoring, have a look at
– Log analysis? Check out
cominvent as
52. Thank You
www.cominvent.com
jh@cominvent.com
www.twitter.com/cominvent
linkedin.com/in/janhoy
This presentation licensed under CC-by-sa license
cominvent as You must attribute Cominvent with name and link