A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
Today’s applications are expected to provide powerful full-text search. But how does that work in general and how do I implement it on my site or in my application? Actually, this is not as hard as it sounds at first. This talk covers: * How full-text search works in general and what the differences to databases are. * How the score or quality of a search result is calculated. * How to implement this with Elasticsearch. Attendees will learn how to add common search patterns to their applications without breaking a sweat.
"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
Apache Solr was always built on strong Information Retrieval/Natural Language Processing foundation. And, in recent versions, even more Artificial Intelligence features, techniques and integrations were added to the Solr.
This presentation covers some classic (and hidden gems) AI elements that Solr supported for long time as well as the most recent features that are not even fully documented yet.
The presentation was made with references to Solr 7.4.
Quando uma aplicação começa a ficar grande e complexa, fazer buscas nos seus models torna-se uma tarefa complicada. Efetuar as buscas diretamente no banco de dados é um processo lento, ineficiente e que permite pouca ou nenhuma maleabilidade sobre a forma com que a busca é feita. Surge então o ElasticSearch, uma engine de busca utilizada por empresas como Github, Twitter e 4square para indexar e buscar literalmente milhões de documentos em tempo real. Nessa palestra, explicarei quando, como e porque utilizar o ElasticSearch para facilmente indexar e efetuar buscas complexas nos seus models.
Similar to Solr vs. Elasticsearch - Case by Case (20)
From content to search: speed-dating Apache Solr (ApacheCON 2018)Alexandre Rafalovitch
While fully nuanced search implementation takes time, getting basic data ingestion, schema design and critical-path insights does not have to be a painful experience.
This talk uses several real-life datasets (from Data is Plural mailing list) and shows "Rapid Application Development"-style workflow to get the data into Solr and shape it ready for initial searchability and relevancy analysis.
Different stages of content ingestion, pre-processing, analysis, and querying are explained, together with trade-offs of different built-in approaches. Relevant document links are also included for more in-depth research.
This talk is for everybody who wants Search in their development stack but is not sure exactly where to start.
Apache Solr is a search engine that can scale from a personal project to a multi-terabyte cloud hosted cluster. At the same time, this ability to scale, tune and adjust to the clients' needs, can make it hard to understand the right aspects of Solr to bring to the problem.
In this session, Alexandre Rafalovitch (an Apache Solr committer) will do a speed run demonstrating how to create and tune a Solr 7.3 instance for a hypothetical Corporate Phone Directory application. It will cover:
*) The smallest learning schema/configuration required
*) Rapid schema evolution workflow
*) Dealing with multiple languages
*) Dealing with misspellings in search
*) Searching phone numbers
Presented at Solr meetup in Montreal, in May 2018.
Backing GitHub repository is: https://github.com/arafalov/solr-presentation-2018-may
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Globus Connect Server Deep Dive - GlobusWorld 2024
Solr vs. Elasticsearch - Case by Case
1. Solr vs. Elasticsearch
Case by Case
Alexandre Rafalovitch @arafalov
@SolrStart
www.solr-start.com
2. Meet the FRENEMIES
Friends (common)
• Based on Lucene
• Full-text search
• Structured search
• Queries, filters, caches
• Facets/stats/enumerations
• Cloud-ready
Elasticsearch*
* Elasticsearch is a trademark of Elasticsearch BV,
registered in the U.S. and in other countries.
Enemies (differences)
• Download size
• AdminUI vs. Marvel
• Configuration vs. Magic
• Nested documents
• Chains vs. Plugins
• Types and Rivers
• OpenSource vs. Commercial
• Etc.
3. This used to be Solr (now in Lucene/ES)
• Field types
• Dismax/eDismax
• Many of analysis filters (WordDelimiterFilter, Soundex, Regex,
HTML, kstem, Trim…)
• Multi-valued field cache
• …. (source: http://heliosearch.org/lucene-solr-history/ )
• Disclaimer: Nowadays, Elasticsearch hires awesome Lucene hackers
4. Basically - sisters
Source: https://www.flickr.com/photos/franzfume/11530902934/
First run
Expanded
Download
300
250
200
150
100
50
0
Solr Elasticsearch
9. Basic search in Elasticsearch
GET /test1/hello/_search
…..
{
"_index": "test1",
"_type": "hello",
"_id": "AUmIk4LDF4XvfpxnVJ2g",
"_score": 1,
"_source": {
"msg": "Happy birthday",
"names": [
"Alex",
"Mark"
],
"when": "2014-11-01T10:09:08"
}
….
• GET /test1/hello/_search?q=foobar – no results
• GET /test1/hello/_search?q=Alex – YES on names?
• GET /test1/hello/_search?q=alex – YES lower case
• GET /test1/hello/_search?q=happy – YES on msg?
• GET /test1/hello/_search?q=2014 – YES???
• GET /test1/hello/_search?q="birthday alex" – YES
• GET /test1/hello/_search?q="birthday mark" – NO
Issues:
1. Where are we actually searching?
2. Why are lower-case searches work?
3. What's so special about Alex?
10. All about _all and why strings are tricky
• By default, we search in the field _all
• What's an _all field in Solr terms?
<field name="_all" type="es_string" multiValued="true" indexed="true" stored="false"/>
<copyField source="*" dest="_all"/>
• And the default mapping for Elasticsearch "string" type is like:
<fieldType name="es_string" class="solr.TextField" multiValued="true" positionIncrementGap="0" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
• Elasticsearch equivalent to Solr's solr.StrField is:
{"type" : "string", "index" : "not_analyzed"}
12. Nearly the same magic
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<!-- UUIDUpdateProcessorFactory will generate an id if none is
present in the incoming document -->
<processor class="solr.UUIDUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
<processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
<processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
<processor class="solr.ParseDateFieldUpdateProcessorFactory">
<arr name="format">
<str>yyyy-MM-dd'T'HH:mm:ss</str>
<str>yyyyMMdd'T'HH:mm:ss</str>
</arr>
</processor>
<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<str name="defaultFieldType">es_string</str>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">booleans</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">tdates</str>
</lst>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
Not quite the same magic:
• URP chain happens before copyField
• Date/Ints are converted first
• copyText converts content back to string
• _all field also gets copy of _id and _version
• All auto-mapped fields HAVE to be multivalued
• No (ES-Style) types, just collections
• Unable to reproduce cross-field search
• Still rough around the edges
• Requires dynamic schema, so adding new types
becomes a challenge
• Auto-mapping is NOT recommended for production
• Dynamic fields solution is still more mature
13. Explicit mapping - Solr
• In schema.xml (or dynamic equivalent)
• Uses Java Factories
• Related content (e.g. stopwords) are usually in separate files (recently added REST-managed)
• French example:
<fieldType name="text_fr" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_fr.txt" format="snowball" />
<filter class="solr.FrenchLightStemFilterFactory"/>
</analyzer>
</fieldType>
14. Explicit mapping - Elasticsearch
• Created through PUT command
• Also can be stored in config/default-mapping.json or
config/mappings/[index_name]
• Mappings for all types in one index should be compatible to avoid problems
• Usually uses predefined mapping names. Has many names, including for
languages
• Explicit mapping is through named cross-references, rather than duplicated in-place
stack (like Solr)
• Related content is usually also in the definition. Sometimes in file (e.g.
stopwords_path – needs to be on all nodes)
• French example (next slide):
16. Default analyzer - Elasticsearch
Indexing
1. the analyzer defined in the field
mapping, else
2. the analyzer defined in the _analyzer
field of the document, else
3. the default analyzer for the type,
which defaults to
4. the analyzer named default in the
index settings, which defaults to
5. the analyzer named default at node
level, which defaults to
6. the standard analyzer
Query
1. the analyzer defined in the query
itself, else
2. the analyzer defined in the field
mapping, else
3. the default analyzer for the type,
which defaults to
4. the analyzer named default in the
index settings, which defaults to
5. the analyzer named default at node
level, which defaults to
6. the standard analyzer
17. Index many documents – Elasticsearch
POST /test3/entries/_bulk
{ "index": {"_id": "1" } }
{"msg": "Hello", "names": ["Jack", "Jill"]}
{ "index": {"_id": "2" } }
{"msg": "Goodbye", "names": "Jason"}
{ "delete" : {"_id" : "3" } }
NOTE: Rivers (similar to DIH) MAY be deprecated.
Use Logstash instead (180Mb on disk, including 2 jRuby runtimes !!!)
19. Comparing search - Search
• Same but different
• Same: vast majority of the features
come from Lucene
• Different: representation of search
parameters
• Solr: URL query with many – cryptic –
parameters
• Elasticsearch:
• Search lite: URL query with a
limited set of parameters (basic
Lucene query)
• Query DSL: JSON with multi-leveled
structure
Lucene
Impl ES
only
Solr
only
22. Search Compared – Query DSL - combo
Search future entries about Jack. Return only the best one.
Elasticsearch
GET /test1/hello/_search
{
"size" : 1,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "jack"
}},
"filter": {
"range": {
"when": {
"gte": "now"
}}}}}}
Solr
…/collection1/select
?q=jack
&fq=when:[NOW TO *]
&rows=1
23. Parent/Child structures
Inner objects
• Mapping: Object
• Dynamic mapping (default)
• NOT separate Lucene docs
• Map to flattened
multivalued fields
• Search matches against
value from ANY of inner
objects
{
"followers.age": [19, 26],
"followers.name":
[alex, lisa]
}
Elasticsearch
Nested objects
• Mapping: nested
• Explicit mapping
• Lucene block storage
• Inner documents are hidden
• Cannot return inner docs only
• Can do nested & inner
Parent and Child
• Mapping: _parent
• Explicit references
• Separate documents
• In-memory join
• SLOW
Solr
Nested objects
• Lucene block storage
• All documents are visible
• Child JSON is less natural
24. Cloud deployment – quick take
1. General concepts are similar:
• Node discovery
• Sharding
• Replication
• Routing
1. Implementations are very, very different (layer above Lucene)
2. Solr uses Apache Zookeeper
3. Elasticsearch has its own algorithms
4. No time to discuss
5. Let's focus on the critical path: Node discovery/cloud-state management
6. Use a 3rd party analysis: Kyle Kingsbury's Jepsen tests
25. Jepsen test of Zookeper
Use Zookeeper. It’s mature, well-designed, and battle-tested.
26. Jepsen test of Elasticsearch
If you are an Elasticsearch user (as I am): good luck.
27. Innovator’s dilemma
• Solr's usual attitude
• An amazingly useful product for many different uses
• And wants everybody to know it
• …Right in the collection1 example
• “You will need all this eventually, might as well learn it first”
• Elasticsearch is small and shiny (“trust us, the magic exists”)
• Elasticsearch + Logstash + Kibana => power-punch triple combo
• Especially when comparing to Solr (and not another commercial solution)
• Feature release process
• Elasticsearch: kimchy: “LGTM” (Looks good to me)
• Solr: full Apache process around it
• Solr – needs to buckle down and focus on onboarding experience
• Solr is getting better (e.g. listen to SolrCluster podcast of October 24, 2014)
28. Solr vs. Elasticsearch
Case by Case
Alexandre Rafalovitch
www.solr-start.com
@arafalov
@SolrStart
Editor's Notes
Ladies and Gentlemen, Mademes et Messier, Dami I gospoda, Remote viewers.
My name is Alexandre Rafalovitch. I work for the United Nations, but I am not representing it here or at the conference.
I am here instead as a SolrStart popularizer.
In this session we will compare Solr and ElasticSearch a little deeper than usual. But even then, to give you a real sense would take a two hour workshop.
All I got is 30 minutes. So, we are going to go FAST.
T: So, what's the best way to compare Solr and Elasticsearch. Think of it as Frenemies.
Lots of stuff in common since both are based on Lucene, especially the deep features around the actual search
But lots of things are different, some due to Elasticsearch's ease-of-use approach and some due to it's commercial nature.
Speaking of commercial nature, I have to tell you that Elasticsearch is a trademark…..
T: Before we start digging in, it is important to remember a bit of history….
…. A number of Lucene features used by Elasticsearch have actually came from Solr originally.
Now, of course, Elasticsearch are contributing back to Lucene too.
But with that history in mind, what kind of Frenemies I think these two search engines represent?
I think they are sisters. Solr is an older – ahm, more visible – sister, while Elasticsearch is a younger, slimmer one.
This size comparison is one of the sticky points about Solr distribution. And it really is an issue.
Elasticsearch downloads as 27Mb archive and expands to about 35Mb.
Solr downloads – eventually – as a 150Mb archive and expands by two Elasticsearch'es on disk. And then by another Elasticsearch after the first run, when the web archive expands
T: So is that a healthy bulk or ????
Downloaded: Solr: 150Mb, Elasticsearch: 27Mb
Expanded: Solr: 225Mb, Elasticsearch: 34Mb
On first run: Solr 258Mb, Elasticsearch: 34Mb (no change)
… Is Solr Chubby or just Rubenesque with all the great features? Let's have a look
We can see that Solr (at the bottom) of course has core search and libraries, but also examples, documentations, various contributions and such as Tika Rich Content extraction support, UIMA, Map-Reduce and even test framework. It also has an extensive administration UI built in, as well as things like DataImportHandler.
Elasticsearch does not come with any of these. It has plugins instead. So, if you start adding that functionality as plugins, it will also grow in side, though still far cry from Solr's bulk.
T: Given that Elastisearch does require plugins to effectively function….
Solr: Still Chubby – but not all fat is excess Still, some exercise could really be beneficial.
And if Elasticsearch does deprecate it's rivers and replaces with logstash – that's another whopping 85 Mbytes, nearly 3 times Elasticsearch install itself.
… I think it is fair to say that real Elasticsearch installation "comes with baggage".
And, even though Elasticsearch has plugin mechanism, actually using it shows that it is NOT as comprehensive as Ruby's or Node's package management.
And, of course, without good dependency management, we get jars breeding like rabbits.
To be balanced, Solr ALSO supports plugins/loaders but – at the moment - it does not even have a way to find any of those custom components in the wild.
T: Ok. Enough high level view, let's dig into the action. Let's INDEX something.
In Elasticsearch, there are two ways to index a single document. Both use HTTP verbs: POST will auto generate IDs, PUT expects you to provide them.
Neither require you setting up a collection (index in ES terms) or a type (ES-only concept). Same with types and field mappings.
Elasticsearch will automagically create whatever is needed behind the scenes. Of course, magic has it's price and you get what you get and you don't complain.
If you do want to complain or setup your own mappings, Elasticsearch does let you do it. Solr, of course, is the other way around.
T: So, let's see what we actually did get…
_source – that stores original JSON content. Not searchable. If you want to not store something, you have to explicitly exclude it. Elasticsearch does also support stored fields, so you do have to think of your data on 3 levels of source, stored and indexed, as compared to Solr's 2 levels
_id field – actually it is a composite ID, as all different types are are the same index/collection
Both msg and names are of type string. ElasticSearch automatically has all fields multivalued, which will automatically return either value or an array. Which may cause problems to the clients that expect specific form
Notice magic date parsing
T: So, given all that magic mapping, what happens with search?....
copyField means we get all the fields as text regardless of magic mapping
StandardTokenizer means the date formats break on colons
positionIncrementGap=0 means the search phrase will match across the boundaries of the text. Not something we recommend, actually.
T: So, can Solr do a similar kind of magic?...
… Sort of: I can configure Solr to do similar autodetect.
(pause)
… But
But it takes a lot of deliberate planning. And the dynamic schema implementation is still rough around the edges.
However, Auto-mapping is NOT recommended for production – either for Solr or Elasticsearch. Because "Magic has it's price" and the price for incorrectly mapped definition is actually quite high – you have to reindex everything.
So, for production, dynamic fields are still a better choice.
T: Ok, so let's look at the non-magical parts of defining a custom type…
You all know how it is done in Solr. A type definition is a self-contained XML section in schema.xml file. It defines the full stack and refers to the class names and file names of relevant factories in resources.
So, here, we have a text_fr (french) type that uses Standard tokenizer and a bunch of filters with relevant word lists.
T: In Elasticsearch it is quite a bit different…
All the examples show how to create mapping using REST interface, though – if you search documentation hard enough – it can also be defined in several places on disk.
The mappings are defined per type, though all types within an index better have compatible definitions.
That's because "Types" within an index are an elasticsearch transparent aliasing on top of Lucene's hard reality of collection/index implementation.
So, there are consequences.
Regarding the mapping itself, one thing Elasticsearch did is pre-define a lot – A LOT – of the standard analyzer stacks.
So, if you trust Elasticsearch's choices, you don't even need to understand how they are setup and they are never visible. You just map "french" as a type without any extra configuration.
T: But if you do need it defined explicitly…
… It is – of course – in JSON format and uses sort-of cross-reference style
where you declare inner components first (on the left here) and then
proceed to define the specific stack.
This is useful if you leverage existing component-definitions or just reuse them in multiple analyzers.
But because Elasticsearch uses files as exception rather than a rule, most of the time, the additional resources, such as elision keywords above may need to be specified inline.
Then you can use these types by names either explicitly or by defining it as a default analyzer.
T: Trying to give a lot of flexibility, Elasticsearch actually allows to define that analyzer in a lot of places
… I will let you decide for yourself whether having that much choice is a good thing or a bad thing.
I, personally, suspect that this might be great during development but a real pain in the tuches to troubleshoot when it's gone wrong
T: Ok, enough with small measures, let's look at indexing content in bulk…
In Elasticsearch, there is not that much choice on indexing in bulk.
Different end point
A completely different JSON-line indexing format
Careful of new lines (fairly unforgiving)
You also have Rivers, but ….
T: On the other hand, Solr….
T: Ok, now that we know how to index a bunch of documents, let's dig a bit deeper into the search
Three records:
2 – share the word happy in msg
2 – overlapping but not same – share name 'Jack' in Names
1 record is in the past, two are in the future as of today
So, basic searches using URL are virtually identical between Elasticsearch and Solr.
Just a note, in real URLs, the parameters would be URL-Escaped : spaces, percent symbols
Of course, these results are with Solr configuration trying to match Elasticsearch magic, so by default we are searching against an _all field that aggregates all content
That's why we get two records for "happy birthday Alex", because it is looking for any term to match.
T: So, let's try to change the search to look in the specific fields and require all terms to match
… Notice how we already run out of steam for Elasticsearch's URL query approach and have to switch to the Query DSL
Also notice that in Solr, we have to declare explicitly to now use Dismax. ES does some sort of magic switch when it see you specifying specific "fields"
T: Now, we don't have time to slowly build on this, so we jump ahead a couple of levels to a more complex example….
LISP = Lots of Infuriating & Silly Parentheses
Elasticsearch: kind of the same about JSON braces
ES: Good and bad – 3 different syntaxes or use all of these
Solr is kind of in between Nested objects and Parent and Child. They are indexed as a block, but you can get child documents separately from parent ones.
T: Finally, let's very briefly touch on the cloud and scaling issues
Kimchy = Shay Banon, Elasticsearch creator
Solr: +1, +1, +1, Let’s fight a bit, +1, +1, Release (lots of cooks => good & bad)
Conclusion:
Frankly, I do not have one. It would take another couple of hours to get to the point of informed choice.
For myself, I am return back to Solr only and am about to start writing a Solr book for O'Reilly
But if you really need some of the Elasticsearch-only features and do not mind the price you have to pay for it's magic, this is also a viable choice
Now, we are out of time for questions, but I will be monitoring the conference application, so you can ask them there.
Thank you very much.