An overview of Elasticsearch features and explains performing smart search, data aggregations, and relevancy through scoring functions. How Elasticsearch works as a distributed scalable data storage. Finally, showcasing some use cases that are currently becoming core functionalities in Zalando.
Searching Relational Data with Elasticsearchsirensolutions
Second Galway Data Meetup, 29th April 2015
Elasticsearch was originally developed for searching flat documents. However, as real world data is inherently more complex, e.g., nested json data, relational data, interconnected documents and entities, Elasticsearch quickly evolves to support more advanced search scenarios. In this presentation, we will review existing features and plugins to support such scenarios, discuss their advantages and disadvantages, and understand which one is more appropriate for a particular scenario.
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
Scalable Data Models with ElasticsearchBeyondTrees
At bol.com, a leading ecommerce platform in The Netherlands, we have done extensive research into what it would take to use ElasticSearch as the main search provider. We will explain the specific challenges and requirements of running an Elasticsearch cluster at bol.com-scale, and show how we have used generated data to do performance and scalability tests on different ways to model a hierarchical data model into Elasticsearch. We will describe the benefits and drawbacks of the different data model options, and their consequences for the design of the index and search applications.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
Searching Relational Data with Elasticsearchsirensolutions
Second Galway Data Meetup, 29th April 2015
Elasticsearch was originally developed for searching flat documents. However, as real world data is inherently more complex, e.g., nested json data, relational data, interconnected documents and entities, Elasticsearch quickly evolves to support more advanced search scenarios. In this presentation, we will review existing features and plugins to support such scenarios, discuss their advantages and disadvantages, and understand which one is more appropriate for a particular scenario.
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
Scalable Data Models with ElasticsearchBeyondTrees
At bol.com, a leading ecommerce platform in The Netherlands, we have done extensive research into what it would take to use ElasticSearch as the main search provider. We will explain the specific challenges and requirements of running an Elasticsearch cluster at bol.com-scale, and show how we have used generated data to do performance and scalability tests on different ways to model a hierarchical data model into Elasticsearch. We will describe the benefits and drawbacks of the different data model options, and their consequences for the design of the index and search applications.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
Philly PHP April 2017 Meetup: Introduction to Elastic Search as presented by Aditya Bhamidpati on April 19, 2017.
These slides cover an introduction to using Elastic Search
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
Philly PHP April 2017 Meetup: Introduction to Elastic Search as presented by Aditya Bhamidpati on April 19, 2017.
These slides cover an introduction to using Elastic Search
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
Elasticsearch presentation for Devoxx France 2012
English translation (feel free to correct my bad english ;-) )
French version is available here : http://www.slideshare.net/dadoonet/elasticsearch-devoxx-france-2012
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.seEdahn Small
1. Know your goal | make each slide count
2. Plan it out | in some detail
3. Avoid templates | they have the uglies
4. Choose a color scheme | 4 colors, 1 accent
5. Choose a font scheme | match tone
6. Choose a layout scheme | comprehension
7. Use images (wisely) | they’re more memorable
8. 15 words per slide | this slide had 16 words
9. Play with typography | impact, interest, hierarchy
10. Don’t overdo it | white space
Hope you enjoy!
SEE MORE OF MY WORK: http://www.visuali.se
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Spark Summit
The detection and analysis of rare genomic events requires integrative analysis across large cohorts with terabytes to petabytes of genomic data. Contemporary genomic analysis tools have not been designed for this scale of data-intensive computing. This talk presents ADAM, an Apache 2 licensed library built on top of the popular Apache Spark distributed computing framework. ADAM is designed to allow genomic analyses to be seamlessly distributed across large clusters, and presents a clean API for writing parallel genomic analysis algorithms. In this talk, we’ll look at how we’ve used ADAM to achieve a 3.5× improvement in end-to-end variant calling latency and a 66% cost improvement over current toolkits, without sacrificing accuracy. We will talk about a recent recompute effort where we have used ADAM to recall the Simons Genome Diversity Dataset against GRCh38. We will also talk about using ADAM alongside Apache Hbase to interactively explore large variant datasets.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Hierarchal clustering and similarity measures along with multi representationeSAT Journals
Abstract All clustering methods have to assume some cluster relationship on the list of data objects that they really are applied on. Graph-Based Document Clustering works with frequent senses rather than frequent keywords used in traditional text mining techniques.Similarity between a pair of objects can be defined either explicitly or implicitly. With this paper, we analyzed existing multi-viewpoint based similarity measure and two related clustering methods. The main difference between a traditional dissimilarity/similarity measure and ours could be that the former uses merely a single viewpoint, which is the origin, even though the latter utilizes many viewpoints, which you ll find are objects assumed to not have the very same cluster using the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could well be achieved. Theoretical analysis and empirical study are conducted to back up this claim. Two criterion functions for document clustering are proposed dependent on this wonderful measure. We compare them several well-known clustering algorithms which use other popular similarity measures on various document collections confirming the good sides of our proposal. Keywords –Multiview Cluster, Document id, ClusterDistance
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
In this webinar Anthony J. Sarkis, Chief Strategy Officer at Parabole, and Steve Sarsfield, VP Product at Cambridge Semantics, explore how portfolio managers are using the recently developed Parabole/ AnzoGraph DB integration as their underlying infrastructure for conducting ML and cognitive analytics at scale to exploit data to identify potential risks and new opportunities.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
6. Index / Type
- An index is a collection of documents that should be grouped together for a
common reason.
- A type is a collection of documents all share an identical (or very similar)
schema
19. Relationships
● Application Side Joins
● Parent-Child
● Nested objects ● Parent-child queries can be 5 to 10
times slower than the equivalent
nested query!
●
●
●
22. Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
23. Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
27. Scoring & Relevance in Full-Text Search
Relevance is the algorithm to calculate how similar the contents of a field to a query.
TF/IDF
Term Frequency
How often does the term appear in the field?
Inverse Document Frequency
How often does each term appear in the index?
Field Length Norm
How long is the field?
28. Vector Space Model
The vector space model provides a way of
comparing a multiterm query against a document.
- The model represents both the document and the
query as vectors.
29. Vector Space Model
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
- By measuring the angle between the query vector
and the document vector, it is possible to assign a
relevance score to each document.
- If The angle between a document and the query is
large, so it is of low relevance.
35. Aggregation
Search Analytics
Business Requirement “Help me find the best
documents ?”
“What do theses documents
tell me about my business ?”
Enablers Matching, Relevance,
Filtering, Auto-completion,...
Summaries, Patterns,
Trends, Outliers, Predictions,
Visualization
- Aggregations help build complex summaries & analytics of the indexed data.
51. Significant Terms
- Significant_terms analyzes your data and finds terms that appear with a frequency that is
statistically anomalous compared to the background data.
- It can uncover surprisingly sophisticated trends and correlation in your data.
- Used in discovering anomalies