This document provides an overview of using Perl and Elasticsearch. It discusses using Elasticsearch for log analysis and generating live graphs. It covers when Elasticsearch may or may not be a good fit compared to a SQL database. It provides terminology translations between SQL and Elasticsearch concepts. It also discusses the Elastic Stack including Elasticsearch, Logstash, and Kibana. It provides tips for using Rsyslog instead of Logstash and configuring Elasticsearch clusters for development and production. Finally, it discusses connecting to Elasticsearch and performing basic operations like indexing, searching, and retrieving documents using the Search::Elasticsearch Perl module.
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRARutuja Chudnaik
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA, The Maharashtra Co-operative Societies Act, 1960 (MCS Act) and The Maharashtra Co-operative Societies Rules, 1961 are applicable to any co-operative society registered in Maharashtra and having no branches outside Maharashtra. If any state does not have its own State Act, the Co-operative Societies Act, 1912 and Rules become applicable. However, if a society has operations beyond one State, it is governed by a Central Act viz. the Multi-State Co-operative Societies Act, 2002 (MSCS) and its Rules.
The income earned by a co-operative society is subject to income tax under the Income-tax Act, 1961 and its Rules. It may be noted the income of a co-operative society is eligible for deduction u/s 80P of the Income-tax Act and not an exemption u/s 10. Hence, it is mandatory for all co-operative societies to file income tax return.
Co-operative societies are also governed by circulars, notifications and directives issued from time to time by the various departments of co-operation. A society is also bound by its bye-laws. It has also to follow various accounting and assurance standards issued by the Institute of Chartered Accountants of India.
Sistem pengendalian manajemen termasuk dalam kategori bagian dari pengetahuan perilaku terapan. Pada prinsipnya, sistem pengendalian manajemen ini adalah suatu sistem yang berisi tuntutan kepada seluruh orang yang ada didalam perusahaan untuk menjalankan dan mengendalikan perusahaan yang baik berdasarkan asumsi-asumsi tertentu. Dalam hal ini, perusahaan yang baik tersebut bisa diartikan sebagai tolak ukur performa perusahaan yang mencerminkan perusahaan berjalan secara efektif, efisien dan juga produktif..
Pengendalian manajemen merupakan bagian akhir dari proses manajemen. fungsi utama dari pengendalian manajemen adalah memengaruhi perilaku agar sesuai dengan yang diinginkan. Jika pengendalian manajemen berjalan dan diterapkan dengan baik maka akan meningkatkan kemungkinan tercapainya tujuan perusahaan. Namun pada kenyataannya terdapat beberapa masalah dalam penerapan pengendalian manajemen yaitu lack of direction atau kurangnya pengarahan, masalah motivasi dan keterbatasan individu karyawan. Permasalahan pengendalian tersebut dapat dihindari dengan melakukan kontrol dalam perusahaan dan alternatif lainnya
Wherein I install OpenWRT on to an inexpensive TP-Link pocket router, install perl and attempt to smoke CPAN.
I also introduce OpenWRT in possibly too much detail, and dont really explain what smoking CPAN is.
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRARutuja Chudnaik
PROVISIONS RELATING TO CO-OPERATIVE SOCIETIES IN MAHARASHTRA, The Maharashtra Co-operative Societies Act, 1960 (MCS Act) and The Maharashtra Co-operative Societies Rules, 1961 are applicable to any co-operative society registered in Maharashtra and having no branches outside Maharashtra. If any state does not have its own State Act, the Co-operative Societies Act, 1912 and Rules become applicable. However, if a society has operations beyond one State, it is governed by a Central Act viz. the Multi-State Co-operative Societies Act, 2002 (MSCS) and its Rules.
The income earned by a co-operative society is subject to income tax under the Income-tax Act, 1961 and its Rules. It may be noted the income of a co-operative society is eligible for deduction u/s 80P of the Income-tax Act and not an exemption u/s 10. Hence, it is mandatory for all co-operative societies to file income tax return.
Co-operative societies are also governed by circulars, notifications and directives issued from time to time by the various departments of co-operation. A society is also bound by its bye-laws. It has also to follow various accounting and assurance standards issued by the Institute of Chartered Accountants of India.
Sistem pengendalian manajemen termasuk dalam kategori bagian dari pengetahuan perilaku terapan. Pada prinsipnya, sistem pengendalian manajemen ini adalah suatu sistem yang berisi tuntutan kepada seluruh orang yang ada didalam perusahaan untuk menjalankan dan mengendalikan perusahaan yang baik berdasarkan asumsi-asumsi tertentu. Dalam hal ini, perusahaan yang baik tersebut bisa diartikan sebagai tolak ukur performa perusahaan yang mencerminkan perusahaan berjalan secara efektif, efisien dan juga produktif..
Pengendalian manajemen merupakan bagian akhir dari proses manajemen. fungsi utama dari pengendalian manajemen adalah memengaruhi perilaku agar sesuai dengan yang diinginkan. Jika pengendalian manajemen berjalan dan diterapkan dengan baik maka akan meningkatkan kemungkinan tercapainya tujuan perusahaan. Namun pada kenyataannya terdapat beberapa masalah dalam penerapan pengendalian manajemen yaitu lack of direction atau kurangnya pengarahan, masalah motivasi dan keterbatasan individu karyawan. Permasalahan pengendalian tersebut dapat dihindari dengan melakukan kontrol dalam perusahaan dan alternatif lainnya
Wherein I install OpenWRT on to an inexpensive TP-Link pocket router, install perl and attempt to smoke CPAN.
I also introduce OpenWRT in possibly too much detail, and dont really explain what smoking CPAN is.
Video: https://www.youtube.com/watch?v=uibLwoVKjec . Talk by Brendan Gregg for Sysdig CCWFS 2016. Abstract:
"You have a system with an advanced programmatic tracer: do you know what to do with it? Brendan has used numerous tracers in production environments, and has published hundreds of tracing-based tools. In this talk he will share tips and know-how for creating CLI tracing tools and GUI visualizations, to solve real problems effectively. Programmatic tracing is an amazing superpower, and this talk will show you how to wield it!"
Automating your Infrastructure Deployment with AWS CloudFormation and AWS Ops...Amazon Web Services
When you’re building a new application you need to get new features out fast. Managing your application’s infrastructure as well as responding to changing conditions can be cumbersome and error prone if you rely on manual processes. Treating your infrastructure as code allows you to provision and update complex environments in a predictable manner. You can also offer a pre-defined catalogue of environments for development, testing and experimentation, unlocking your ability to innovate. This session will walk through practical examples and live demonstrations using AWS CloudFormation, AWS OpsWorks, orchestration engines and source control systems to automate your infrastructure deployment and maintenance.
Speaker: Richard Busby, Solutions Architect, Amazon Web Services
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
در این اسلاید به مباحث زیر می پردازیم:
مقدمات پایگاه داده های غیر اس.کیو.ال، مبانی جستجوگرها
سپس معرفی ابزار جستجوی الاستیکی، کاربردها، معماری کلی، مقایسه با ابزارهای مشابه
افزودن تحلیلگر متن و در نهایت لینک آن با دات نت
ا
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
The talk I gave at the Snow Unix Event in Nederland about upgrading a massive production Elasticsearch cluster from a major version to another without downtime and a complete rollback plan.
This presentation explains how to get started with Apache Cassandra to provide a scale out, fault tolerant backend for inventory storage on OpenSimulator.
In the “Sharing is caring” spirit, we came up with a series of internal talks called, By Showmaxers, for Showmaxers, and we recently started making them public. There are already talks about Networks, and Android app building, available.
Our latest talk focuses on PostgreSQL Terminology, and is led by Angus Dippenaar. He worked on Showmax projects from South Africa, and moved to work with us in Prague, Czech Republic.
The talk was meant to fill some holes in our knowledge of PostgreSQL. So, it guides you through the basic PostgreSQL terminology you need to understand when reading the official documentation and blogs.
You may learn what all these POstgreSQL terms mean:
Command, query, local or global object, non-schema local objects, relation, tablespace, database, database cluster, instance and its processes like postmaster or backend; session, connection, heap, file segment, table, TOAST, tuple, view, materialized (view), transaction, commit, rollback, index, write-ahead log, WAL record, WAL file, checkpoint, Multi-version concurrency control (MVCC), dead tuples (dead rows), or transaction exhaustion.
The terminology is followed by a demonstration of transaction exhaustion.
Get the complete explanation and see the demonstration of the transaction exhaustion and of tuple freezing in the talk on YouTube: https://youtu.be/E-RkI3Ws7gM.
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. ElasticSearchis a free and open source distributed inverted index. So it’s a bunch of indexed documents in a repository. As well as it’s fast, incisive search against large volumes of data. And directly accessed to the data in the denormaliz document storage. Additionally in general distributable and highly scalable DB.
This presentation slide is a condensed theoretical overview of Elasticsearch prepared by going through the official ES Definitive Guide and Practical Guide.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. Primary Usage:
Pretty graphs
generated live,
from log’s
In most cases, you will be
asked to feed logs into an
Elasticsearch database.
Then make dashboards
with charts and graphs.
4. At the heart of Elasticsearch is
Apache Lucene
Elasticsearch uses Lucene as its
text indexer.
What it adds is an ability to scale
horizontally with relative ease.
It also adds a comprehensive
RESTful JSON interface.
5. Should I use Elasticsearch?
De-normalized data?
Don’t need transactions?
Willing to fight with Java Runtime
Environment?
Maybe.
Need lots of data types?
Join queries?
Referential integrity?
100’s GB data only?
Access control?
Probably not.
6. Terminology
Roughly equivalent terms...
MySQL Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping/Templates
Index Everything is indexed
SQL Query DSL
SELECT * FROM table … GET http://…
UPDATE table SET … PUT http://…
7. The ELK Stack
A “Stack” with an memorable acronym? Management will love it!
Elasticsearch Logstash
The actual database
software. It’s written in
Java, which explains many
of its quirks.
A log tailer in Java. It’s
performance is appalling.
Don’t waste any time on it.
Kibana
This is a Web frontend to
Kibana, from searches to
graphs and dashboards.
It’s node.js and js heavy.
8. Use Rsyslog instead of
Logstash - IMO it’s pointless to
write logs to file then slurp
them back in.
Amazingly performant and
flexible. Ostensibly much
better than Syslog-ng.
Stay sane by using RainerScript
for config, eliminating all
legacy style syslog config.
Old versions OK on local
machines, but “Syslog servers”
should run the latest 8.x
9. If you’re looking for more of an “all
in one” solution, you might find
graylog to be a good fit.
It can use elasticsearch under the
hood to power it’s searches.
Give it a go, let me know how things
work out?
13. Interesting properties of Elasticsearch
A wildcard can be used in the index part of a query
This feature is a key part of using Elasticsearch effectively
Aliases are used to reference one or more indexes
Multiple changes to aliases can (and should) be grouped into one REST command -
which Elasticsearch executes in an Atomic fashion
A template explicitly defines the mapping (schema) of data for yet to be created schemas.
A regular expression is used to match against data insertions referencing an index
name which does not exist. It is subsequently created
Templates also include other index properties
Such as aliases that a new index should be automatically be made a part of
An Index can be closed without deleting it
It becomes unusable until it is opened again. However it is out of memory and sitting on
disk ready to go
14. Schemaless,
NoSQL?
Elasticsearch queries are
made with JSON in RESTful
http/s. So it’s not SQL.
If no index exists, it will be
created on data insertion. If
no template is defined,
Elasticsearch will guess at the
mapping.
Turn this off, always define a
template for every index.
15. Tips for server hardware selection & OS configuration
● 30GB of RAM for each Elasticsearch instance (beyond this the JVM slows down)
● +25% RAM for OS. 48GB total is a good number
● Use RAID0 (striping) or no RAID on disks. Elasticsearch will ensure data is preserved via
replication
● Spinning disks have yet to be a bottleneck for me. Scale out rather than up. YMMV
● Turn off Transparent Huge Pages - generally a good idea on any and all servers
● Configure Elasticsearch’s JVM to huge Hugepages directly
● By default, Linux IO is tuned to run as poorly as possible (even set these on your laptop/desktop)
○ echo 1024 > /sys/block/sda/queue/nr_requests (maybe more, benchmark to taste)
○ blockdev --setra 16384 /dev/sda
○ Use XFS with mount options like: rw,nobarrier,logbufs=8,inode64,logbsize=256k (XFS rocks)
○ Don’t use partitions, just format the disk as is (mkfs -t xfs /dev/sdb). XFS will automatically
pick the perfect block alignment
○ echo 0 > /sys/block/sda/queue/add_random (exclude the disk as a source of entropy)
● In iptables, it’s generally a good idea to disable connection tracking on the service ports (assuming
you have no outbound rules). This saves on CPU time and avoids filling the connection state table
● Use the same JVM on all nodes. Either Oracle Java or OpenJDK are fine, pick one and don’t mix
16. Tips for Tuning Elasticsearch
● Elasticsearch default settings are for a read heavy load
● There are lots and lots of settings, & lots and lots of blogs talking about how people have tuned their
clusters.
● Blogs can be very helpful to find which combination of settings will be right for you
● Be careful with anything referencing Elasticsearch before 2.0, ignore anything before 1.0. Things
have changed too much
● Note above every setting in your config file a small blurb about what it does and why you have set
that setting. This will help you remember “why on earth did I think that was a good setting??”
● The Elasticsearch official documentation is very very good. Take the time to read what each setting
does before you attempt to change it (or if that that setting still exists in the version you are running)
● Increase settings by small amounts and observe if performance improves
● Having a setting too high or too low can both reduce performance - you’re trying to find the sweet
spot
● More replicas can help read heavy loads if you have more nodes for them to run on, more shards
can too. However, shards cannot be changed after an index is created, replicas you can change at
any time
● More indexes plus more nodes can help write heavy loads
● Don’t run queries against data nodes
17. Elasticsearch lets you scale
horizontally, so you have to actually
scale your work load horizontally…
but without overwhelming your
cluster.
Achieving peak performance in
Elasticsearch is a balancing game
of server settings, indexing strategy
and well conceived queries.
Different workloads will require
retuning your cluster.
18. Degrading and Deleting Data
Elasticsearch is not intended to be a data warehouse.
Design a policy which degrades then eventually deletes your data
Degrades? Reduce the number of replicas, move data to nodes with slower
disks, eventually close the index
Delete data? If you’re using date stamped index named, just drop the index.
Records can also be created with a TTL
19. Degrading and Deleting Data (continued)
Your policy is implemented via cron tasks, only TTL expiry of records is inbuilt
Curator is the stock tool for this. es-daily-index-maintenance.pl from
App::ElasticSearch::Utilities is better IMO
Put them all in a single file like /etc/cron.d/elasticsearch so you can keep track
of them. Or maybe several cron.d files.
Aliases are also very helpful, as Elasticsearch will add indexes to them when
created, if the template defines it. You can then use the cron job to remove
older indexes etc.
20. Single Node
Development
Environment
A single node is a perfectly valid Elasticsearch
cluster. Although, it’s not really suitable for
production it’s perfectly fine for development use.
The node is configured to be a master node and a
data node, with the number of expected masters
also set to 1
For all indexes, shards = 1, replicas = 1
Use upto 30GB of RAM - you will probably be using
less. Don’t worry too much about tuning, dedicated
disks etc.
Elasticsearch is packages for deb, rpm etc. And
only a few settings need changing to get running. Or
chose one of the many Vagrant or similar install
methods available online.
21. Now about Perl
Just use Search::Elasticsearch;
Don’t be tempted to craft JSON and GET/POST yourself
JSON queries translate nicely into Perl data structures, but are much much less
annoying (trailing commas don’t matter)
Search::Elasticsearch takes care of connection pooling, proper
serialization/deserialization, scrolling, and makes bulk requests very easy.
22. Search::Elasticsearch 2.03 includes
support for 0.9, 1.0 and 2.0 series
clusters.
They’re still available by installing
their ::Client modules directly:
Search::Elasticsearch::Client::0_90,
Search::Elasticsearch::Client::1_0 or
Search::Elasticsearch::Client::2_0
Search::Elasticsearch 5.01
dropped support for pre
Elasticsearch 5.0 from the main
tar ball
23. Connecting to Elasticsearch
Explicitly connect to a single server
Provide a number of servers, which the client will RR between (i.e. query
nodes)
Provide a single hostname, and have the client Sniff out the rest of the
cluster. Which it will RR between.
24. Connecting to Elasticsearch (straight from the Pod)
use Search::Elasticsearch;
# Connect to localhost:9200:
my $e = Search::Elasticsearch->new();
# Round-robin between two nodes:
my $e = Search::Elasticsearch->new(
nodes => [
'search1:9200',
'search2:9200'
]
);
# Connect to cluster at search1:9200, sniff all nodes and round-robin between them:
my $e = Search::Elasticsearch->new(
nodes => 'search1:9200',
cxn_pool => 'Sniff'
);
26. Some basics
# Index a document:
$e->index(
index => 'my_app',
type => 'blog_post',
id => 1,
body => {
title => 'Elasticsearch clients',
content => 'Interesting content...',
date => '2013-09-24'
}
);
# Get the document:
my $doc = $e->get(
index => 'my_app',
type => 'blog_post',
id => 1
);
30. Cluster Status, Other stuff
# Cluster status requests:
$info = $e->cluster->info;
$health = $e->cluster->health;
$node_stats = $e->cluster->node_stats;
# Index admin. requests:
$e->indices->create(index=>'my_index');
$e->indices->delete(index=>'my_index');
31. Scrolled Search Results
Elasticsearch has a limit to how many results it will return (which is a setting
you can change, but has side effects)
Like the cursor function in an SQL database, Scrolled Search has the client work
with the server to return results in small chunks.
Search::Elasticsearch takes care of all the details and makes it almost
transparent.
32. Scrolled Search (like a cursor in SQL)
my $es = Search::Elasticsearch->new;
my $scroll = $es->scroll_helper(
index => 'my_index',
body => {
query => {...},
size => 1000, # chunk size
sort => '_doc'
}
);
say "Total hits: ". $scroll->total;
while (my $doc = $scroll->next) {
# do something
}
33. Bulk Functions
RESTful HTTP/s has a lot of overheads and adds a lot of latency. Inserting one
record per HTTP request will almost certainly never keep up with your logs.
Bulk requests allow more than one action at a time for each HTTP request.
Search::Elasticsearch makes this very very easily. You push actions into the
$bulk object, and it will flush them based on your parameters or when explicitly
asked. Callbacks hooks are also provided
(Elasticsearch used to have a UDP data insert feature. It’s gone now)
34. Bulk Functions
my $es = Search::Elasticsearch->new;
my $bulk = $es->bulk_helper(
index => 'my_index',
type => 'my_type'
);
# Index docs:
$bulk->index({ id => 1, source => { foo => 'bar' }});
$bulk->add_action( index => { id => 1, source => { foo=> 'bar' }});
# Create docs:
$bulk->create({ id => 1, source => { foo => 'bar' }});
$bulk->add_action( create => { id => 1, source => { foo=> 'bar' }});
$bulk->create_docs({ foo => 'bar' })
35. Bulk Functions (continued)
# on_success callback, called for every action that succeeds
my $bulk = $es->bulk_helper(
on_success => sub {
my ($action,$response,$i) = @_;
# do something
},
);
# on_conflict callback, called for every conflict
my $bulk = $es->bulk_helper(
on_conflict => sub {
my ($action,$response,$i,$version) = @_;
# do something
},
);
# on_error callback, called for every error
my $bulk = $es->bulk_helper(
on_error => sub {
my ($action,$response,$i) = @_;
# do something
36. Search::Elasticsearch takes care of
connection pooling - so no load
balancer is required.
It makes Scrolled Searches easy
and almost transparent.
It makes Bulk functions amazingly
easy.
It makes use of several HTTP
clients, picking the “best” one
available on the fly.
It’s awesome! Don’t bother with DIY
37. More Awesomes...
App::ElasticSearch::Utilities - very useful CLI/cron tools for managing
Elasticsearch
Dancer2::Plugin::ElasticSearch - Dancer 2 plugin
Dancer::Plugin::ElasticSearch - Dancer plugin (uses older perl ElasticSearch
library)
Catalyst::Model::Search::ElasticSearch - Catalyst Model
Note: CPAN has lots of ElasticSearch, but Elasticsearch is the correct capitalization
39. Non-search Query
Parameters
All the things you might expect…
...plus many many more!
my $res = $e->search(
index => ‘mydata-*’, # wildcards allowed
body => {
query => { .. }, # search query
},
from => 0, # first result to return
size => 10_000, # no. of results to return
sort => [ # sort results by
{ "@timestamp" => {"order" => "asc"}},
"srcport",
{ "ipv4" => "desc" }, ],
# we don’t want e/s to send us the raw original data
_source => 0,
# which fields we want returned
fields => [ 'ipv4', 'srcport', '@timestamp' ]
);
40. More on Queries
Wildcard queries
What you would expect
Regexp queries
Also, what you would expect
query => {
wildcard => { user => "ki*y" }
}
query => {
regexp => {
"name.first" => "s.*y"
}
}
41. More on Queries
Range query
Used with numeric and date field
types
query => {
range => { # range query
age => { # field
gte => 10, # greater than
lte => 20, # less than
}
}
}
query => {
range => {
date => { # ranges for dates can be date math
gte => "now-1d/d", # /d rounds to the day
Lt => "now/d"
"time_zone" => "+01:00" # optional
}
}
}
42. More on Queries
Exists query
Exists literally the same meaning
as in perl
Bool query
There’s a lot too this, I will just
touch on it
query => {
exists => { field => "user" }
}
query => {
bool => {
must => [ # basically AND
{ exists => { field => 'ipv4' } },
{ exists => { field => 'srcport' } },
{ missing => { field => 'natv4' } }, # opposite of exists
]
}
}
43. Effective queries rely on good mappings
A mapping is the schema
You can create an empty index with the mapping you define
Or, an index can be automatically created on insert, with a mapping based
upon a matching template
The more you can break you data up into fields with a native datatype, the
better Elasticsearch can serve results and the more you can make use of
datetype specific functionality (date math for example)
44. Core Datatypes
The basics
String
● text and keyword
Numeric datatypes
● long, integer, short, byte, double, float
Date datatype
● date
Boolean datatype
● boolean
Binary datatype
● binary
45. Complex
Datatypes
Objects and things
Array datatype
● (Array support does not require a dedicated type)
Object datatype
● object for single JSON objects
Nested datatype
● nested for arrays of JSON objects
46. Geo Datatypes
Fun with maps etc
Geo-point datatype
● geo_point for lat/lon points
Geo-Shape datatype
● geo_shape for complex shapes like polygons
47. Specialised
Datatypes
You’ll need to read up on a lot of
these.
IP datatype
● ip for IPv4 and IPv6 addresses
Completion datatype
● completion to provide auto-complete suggestions
Token count datatype
● token_count to count the number of tokens in a string
mapper-murmur3
● murmur3 to compute hashes of values at index-time and
store them in the index
Attachment datatype
● See the mapper-attachments plugin which supports indexing
attachments like Microsoft Office formats, Open Document
formats, ePub, HTML, etc. into an attachment datatype.
Percolator type
● Accepts queries from the query-dsl
48. Summary
● Select sensible hardware (or VM) and tune your OS
● Know your workload and tune Elasticsearch to match
● Rsyslog is amazing, it can talk natively to Elasticsearch and is unbelievably scalable
● Search::Elasticsearch is always the way to go (except perhaps, for trivial shell scripts)
● Break your data up into as many fields as you can
● Use native dataypes and get maximum value using Elasticsearch’s query functions
● More shards and/or more replicas with more servers will increase query performance
● More indexes will increase write performance if you write across them
● Use Index names with date stamps and Aliases to manage data elegantly and efficiently
● Plan how you will degrade then drop data