Elasticsearch is presented as an expert in real-time search, aggregation, and analytics. The document outlines Elasticsearch concepts like indexing, mapping, analysis, and the query DSL. Examples are provided for real-time search queries, aggregations including terms, date histograms, and geo distance. Lessons learned from using Elasticsearch at LARC are also discussed.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Check out my Elasticsearch course and get it for only $10:
https://www.udemy.com/elasticsearch-complete-guide/?couponCode=SLIDESHARE10&utm_source=slideshare&utm_campaign=slideshare&utm_medium=referral
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
Philly PHP April 2017 Meetup: Introduction to Elastic Search as presented by Aditya Bhamidpati on April 19, 2017.
These slides cover an introduction to using Elastic Search
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Clustering algorithms, for example, try to partition elements of a dataset into related groups. Dimensionality reduction algorithms search for a simpler representation of a dataset. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to huge datasets. In this talk, we'll dive into uses and implementations of Spark's K-means clustering and Singular Value Decomposition (SVD).
Bio:
Sandy Ryza is an engineer on the data science team at Cloudera. He is a committer on Apache Hadoop and recently led Cloudera's Apache Spark development.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Check out my Elasticsearch course and get it for only $10:
https://www.udemy.com/elasticsearch-complete-guide/?couponCode=SLIDESHARE10&utm_source=slideshare&utm_campaign=slideshare&utm_medium=referral
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
Philly PHP April 2017 Meetup: Introduction to Elastic Search as presented by Aditya Bhamidpati on April 19, 2017.
These slides cover an introduction to using Elastic Search
Introduction to Elastic Search
Elastic Search Terminology
Index, Type, Document, Field
Comparison with Relational Database
Understanding of Elastic architecture
Clusters, Nodes, Shards & Replicas
Search
How it works?
Inverted Index
Installation & Configuration
Setup & Run Elastic Server
Elastic in Action
Indexing, Querying & Deleting
Unsupervised learning refers to a branch of algorithms that try to find structure in unlabeled data. Clustering algorithms, for example, try to partition elements of a dataset into related groups. Dimensionality reduction algorithms search for a simpler representation of a dataset. Spark's MLLib module contains implementations of several unsupervised learning algorithms that scale to huge datasets. In this talk, we'll dive into uses and implementations of Spark's K-means clustering and Singular Value Decomposition (SVD).
Bio:
Sandy Ryza is an engineer on the data science team at Cloudera. He is a committer on Apache Hadoop and recently led Cloudera's Apache Spark development.
Новые стандарты по делопроизводству (международная практика).Natasha Khramtsovsky
Выступление Натальи Храмцовской о новых стандартах по делопроизводству в международной практике на Четвертой всероссийской практической конференции «Эффективный документооборот в органах государственной власти и местного самоуправления», апрель 2006.
Dr Natasha Khramtsovsky's presentation "New records management standards: International practice" at the Records Managers Guild' conference, Moscow, April 2006.
д-р Лючиана Дюранти – Расширенная версия презентации на английском языке к се...Natasha Khramtsovsky
Презентация доклада д-ра Лючианы Дюранти о проблемах обеспечения аутентичности электронных документов и доверия к ним, а также о руководимом ею новом международном проекте InterPARES Trust. Презентация была подготовлена к организованному компанией "Электронные Офисные Системы" семинару в Москве 23 сентября 2013 года. Расширенная версия содержит значительное количество дополнительных материалов и предназначена для более глубокого ознакомления участников семинара с рассматриваемыми на нём вопросами.
Presentation of Dr. Luciana Duranti (Director, Centre for the International Study of Contemporary Records and Archives, University of British Columbia; Director, InterPARES Trust project) on the authenticity and trust in electronic environment and on InterPARES Trust project. The presentation was prepared for the seminar in Moscow on September 23, 2013. The extended version contains numerous additional materials which are helpful for deeper understanding of the issues in question.
Gruppo Sanniti: Individuazione di obiettivi trasversali per classi del trienn...Angela Iaciofano
Gruppo Sanniti: Presentazione del corso "Individuazione di obiettivi trasversali per classi del triennio delle Scuole Superiori" progettato nel Master Universitario in e-learning dell'UniTuscia a.a. 2008-2009
Kennis ontstaat niet tussen de oren maar tussen de neuzen.
Bijdrage van Sven Pans (senzw.be) op de SoCiuS-studiedag 'Kennis delen over kennisdelen' op 4 mei 2010.
Presentatie David Leyssens (KAURI) - Socius Trefdag 'Solidariteit?!' (20 november 2014)
KAURI is een netwerk van mensen en organisaties die innovatief willen samenwerken rond duurzaamheid in België. KAURI linkt profit en non-profit rond een brede waaier aan thema’s, zoals goed bestuur, maatschappelijke verantwoordelijkheid, eerlijke handel en leefmilieu. Enerzijds daagt het netwerk haar leden uit om zowel winst- als waardegedreven te werken.
Anderzijds stimuleren het hen om vooroordelen uit de weg te gaan en samen oplossingen te bedenken voor de grote maatschappelijke uitdagingen. Zo’n duurzame samenwerkingen slagen enkel als alle betrokken partijen zowel doelen, verantwoordelijkheden als resultaten delen. Tijdens deze workshop verneem je hoe je hier aan begint en geeft Danny Jacobs, directeur van Bond Beter Leefmilieu Vlaanderen, dit met enkele concrete voorbeelden uit zijn organisatie.
Extensible RESTful Applications with Apache TinkerPopVarun Ganesh
Presented at Graph Day SF 2018.
You are into data analytics. You come across a source of data and you realise that it is an intuitive case for a Knowledge Graph and that there is much value to be gained by incorporating it into one. How do you take this from zero to product while ensuring that it is well-tested, extensible, scalable and plays nicely with other components and services?
Slack, with its various interactions among its users is a prime candidate for this. Join us as we take you through our journey of conceptualizing Slack user data as a knowledge graph, evaluating different frameworks, incorporating business logic using TinkerPop with an extensible DSL and exposing it all through a familiar RESTful interface that allows us to effectively handle an ever-growing and dynamic graph.
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsMichael Reinsch
Slides for my introduction to using Elasticsearch with Ruby/Rails, held at Tokyo Rubyist Meetup in October 2015.
In it I'm giving a short overview on how to use Elasticsearch using the official Elasticsearch Gems with a Ruby on Rails project. Starting with a simple example, expanding to multilanguage, multiobject indexes.
Also I'm shortly discussing integration testing and production hosting.
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
We’ve built a real-time streaming platform that enables prediction based on user behavior, with events occurring in virtual and augmented reality environments. The solution enables organizations to train people in an extended reality environment, where real-life training may be costly and dangerous. Kafka Streams enables analyzing spatial and event data to detect gestural feature and analyze user behavior in real-time to be able to predict any future mistake the user might make. Kafka is the backbone of our real-time analytics and extended reality communication platform with our cluster and applications being deployed on Kubernetes.
In this talk, we will mainly focus on the following: 1. Why Extended Reality with Kafka is a step in the right direction. 2. Architecture & Power of Schema Registry in building a generic platform for pluggable XR apps and analytics models 3. How KSQL and Kafka Streams fits in Kafka Ecosystem to help analyze human motion data and detect features for real-time prediction. 4. Demo of a VR application with real-time analytics feedback, which assists people to be trained in how to work with chemical laboratory equipment.
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.
German slides for different use cases for Elasticsearch: Document Store, full text search, flexible query cache, geospatial search, logfile analytics, analytics.
Demo presentation given at the Semantic Web Applications and Tools for Life Science (SWAT4LS) 2014 meeting in Berlin, Dec 10, 2014. http://www.swat4ls.org/workshops/berlin2014/scientific-programme/
Audio available: https://www.liferay.com/web/events-symposium-north-america/recap
Liferay makes it easy to integrate your application with powerful search engines. However, it may be hard to diagnose why your most important content isn't showing up the way you need it to. This session will recap the key concepts for indexing and querying with Liferay Search, and present a number of techniques to guarantee your documents will be found with best possible relevance.
André de Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been a Java developer and architect for the last 15 years. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
The core Search frameworks in Liferay 7 have been significantly retooled to benefit not only from Liferay's new modular architecture, but also from one of the most innovative players in the market: Elasticsearch, which replaces Lucene as the default search engine in Portal. This session will cover topics like clustering and scalability, unveil improvements (both Elasticsearch and Solr) like aggregations, filters, geolocation, "more like this" and other new query types, and also hot new features for the Enterprise like out-of-the-box Marvel cluster monitoring and Shield security.
André "Arbo" Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been writing code for a living for 22 years, 14 of them as a Java developer and architect. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
We recently hosted (3/12/18) a GraphQL meetup in Los Angeles where we discussed how at Replicated we made the transition from using REST API’s to using GraphQL Here are the slides from the discussion.
Start visualizing, analyzing and exploring Instagram feeds/influencer from South Tyrol & Trentino and inquiries/bookings from Touristic Portals from South Tyrol (BigData4Tourism working group) with Elastic.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
Introduction to several aspects of elasticsearch: Full text search, Scaling, Aggregations and centralized logging.
Talk for an internal meetup at a bank in Singapore at 18.11.2016.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
5. Real-time Search
● Indexing
○ Mapping
■ How a document, and the fields it contains are stored and indexed
● Simple type: string, date, long, double, boolean, ip
● Hierarchical: object, nested object
● Specialized type: geo_point, geo_shape, completion
○ Analysis
■ How full text is processed to make it searchable
● Standard Analyzer, Simple Analyzer, Language Analyzer, Snowball Analyzer
● Searching
○ Query DSL
■ Flexible and powerful query language used by Elasticsearch
6. {
“id”: 709260211836485600 ,
“createdAt”: “ Mar 14, 2016 2:09:46 PM ”,
“text”: “ @arinto Just few more days to share #elasticsearch at
#FOSSASIA 2016 https://t.co/PtZk14CNXl ”,
“user”: {
“screenName”: “philipskokoh”,
“name”: “Philips Kokoh”,
...
},
“hashtagEntities”: [
{ “start”: 36, “end”: 50, “text”: “elasticsearch” },
{ “start”: 54, “end”: 63, “text”: “FOSSASIA” }
],
“geoLocation”: {
“latitude”: 1.2971576,
“longitude”: 103.8495769
},
...
}
Mapping
Long
Date
Analyzed String
Nested Object
Geo_point
Object
7. Analysis
● Standard Analyzer
○ [set, the, shape, to, semi, transparent, by, calling, set_trans, 5]
● Simple Analyzer
○ [set, the, shape, to, semi, transparent, by, calling, set, trans]
● Whitespace Analyzer
○ [Set, the, shape, to, semi-transparent, by, calling, set_trans(5)]
● Language Analyzer, e.g. english, french, spanish, arabic, …
○ English analyzer: [set, shape, semi, transpar, call, set_tran, 5]
A quick brown fox jumps over the
lazy dog. Grumpy wizards make
toxic brew for the evil queen and
jack. How razorback-jumping frogs
can level six piqued gymnasts!
Index
Character
Filters
Tokenizer
Token
Filters
Analyzer
“Set the shape to semi-transparent by calling set_trans(5)”Built-in Analyzer
8. Searching
Query DSL
● Based on JSON to define queries :)
● Behavior:
○ Query Context
■ Answering: “How well this document match this query clause?”
■ Return: _score → relevance score
○ Filter Context
■ Answering: “Does this document match this query clause?”
■ Return: boolean yes or no
17. Aggregation and Analytics
Types of aggregations (that we often use):
● Terms Aggregation:
○ Bucketing documents based on numeric/textual content
● Date Histogram Aggregation:
○ Bucketing documents based on date/time value
● Geo Distance Aggregation
○ Bucketing documents based on distance from an origin location
Combined Aggregations
Combined Aggregations & Queries
19. Terms Aggregation
{
“aggs”: {
“mostPopularSource ”: {
“terms”: {
“field”: “source”,
“size”: 3
}
}
}
}
Constant keywords
Name of the
aggregation
Type of the
aggregation
The targeted
field to
perform
aggregation on
Limit the size
of the result
buckets
20. Terms Aggregation
{
“aggs”: {
“mostPopularSource ”: {
“terms”: {
“field”: “source”,
“size”: 3
}
}
}
}
{ ...
“aggregations ”: {
“mostPopularSource ”: {
“doc_count_error_upper_bound”: 87696 ,
“sum_other_doc_count”: 12907898 ,
“buckets”: [
{
“key”: “Twitter for iPhone” ,
“Doc_count”: 27928770
},
{
“key”: “Twitter for Android” ,
“Doc_count”: 21327691
},
{
“key”: “Twitter Web Client” ,
“Doc_count”: 6243422
}
}
}
}
Doc counts are
approximate (->
upper bound on
doc_count error for
each term)
The sum of doc
counts for
buckets not in
the response
Term: the bucket’
s keyword
The doc counts
for this bucket
22. Date Histogram Aggregation
{
“aggs”: {
“numberOfTweetsByMonth ”: {
“date_histogram ”: {
“field”: “ createdAt”,
“interval”: “ month”
}
}
}
}
Type of the
aggregation
The targeted
field to
perform
aggregation onDefine the
interval to
“bucket” the
count
36. {
“id”: 709260211836485600 ,
“createdAt”: “ Mar 14, 2016 2:09:46 PM ”,
“text”: “ @arinto Just few more days to
share #elasticsearch at
#FOSSASIA 2016 https://t.co/PtZk14CNXl ”,
“geoLocation”: {
“latitude”: 1.2971576,
“longitude”: 103.8495769
},
“favorites”: [{
“screenName”: “arinto”,
“origin”: “Indonesia”},
{
“screenName”: “casey”,
“origin”: “Vietnam”}]
}
}
Define the correct mapping
Elasticsearch dynamic mapping, OK for production?
string instead of date
No relation between fields!
Searching for
(screenName == arinto && origin == vietnam)
will return both data
double instead of geo_point
37. Define the correct mapping
{
“id”: 709260211836485600 ,
“createdAt”: “ Mar 14, 2016 2:09:46 PM ”,
“text”: “ @arinto Just few more days to
share #elasticsearch at
#FOSSASIA 2016 https://t.co/PtZk14CNXl ”,
“geoLocation”: {
“latitude”: 1.2971576,
“longitude”: 103.8495769
},
“favorites”: [{
“screenName”: “arinto”,
“origin”: “Indonesia”},
{
“screenName”: “casey”,
“origin”: “Vietnam”}]
}
}
{
//rest of the mapping
"id": {
"type": "long" },
"createdAt": {
"format": "MMM d, y h:m:s a",
"type": "date" },
"text": {
"type": "string" },
"geoLocation": {
"type": "geo_point" },
"favorites": {
"type": "nested"
"properties": {…}}
//..rest of the mapping
}
Elasticsearch dynamic mapping, not OK for production.
Define the correct mapping! Check the docs to learn more about mapping
38. [
…,
{'nancygoh': 'singapore'},
{'lulu': 'china'},
{'barbarella': 'australia'},
{'leticia': 'philippines'},
…,
]
Key: name Value: origin
KV store in Elasticsearch
Key-value store in Elasticsearch
● Field name as the key
● 10 or 100 keys are okay..
● What if you have million of keys?
● Does it scale?
39. KV store - Mapping Explosion!
● Dynamically add new fields in a mapping is an expensive operation
○ Lock the index, add new fields, and propagate index structure changes
● Halt the cluster!
41. Shards
● No magic number
○ However.. you must determine when you
create the index
○ Estimate data growth
○ Prepare for reindex
kopf plugin
42. Index template & rolling index
Rolling(time-based)
index
Reconfigure
shards &
replicas
Index template
Pattern: plr_sg_tweet_*
Define the
correct mapping
48. Replicas
201601 data
1 replica
● You can change it on the fly
● Start with 1 for fault tolerance
● What happen when the read
request rate is very high?
49. Replicas
● Increase accordingly to
○ balance load
○ increasing the availability
○ scale up read requests
201601 data
6 replicas
50. Lesson Learned
● Read the book
● Define correct mapping
● Index template & rolling indices
● Use `alias`
● Scale up using replicas