Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.
How to find out production issues? Where to look for errors when application crashes in live environment? How to Visual Studio 2010 for replicating post mortem scenarios in difficult to reproduce errors? Using Source server, PDB symbols in old fashioned way for new age WCF services.
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis
This is a talk about Netflix's path to Cassandra. The first few slides may look similar to previous presentations, but they are just to set the context. Most the content is brand new!
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
An introduction to Cassandra, including replication + partitioning options, data center awareness, local storage model, data modeling example. Presented by Andrew Byde on 25th August 2011 at NoSQLNow! in San Jose , California
This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.
How to find out production issues? Where to look for errors when application crashes in live environment? How to Visual Studio 2010 for replicating post mortem scenarios in difficult to reproduce errors? Using Source server, PDB symbols in old fashioned way for new age WCF services.
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis
This is a talk about Netflix's path to Cassandra. The first few slides may look similar to previous presentations, but they are just to set the context. Most the content is brand new!
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
An introduction to Cassandra, including replication + partitioning options, data center awareness, local storage model, data modeling example. Presented by Andrew Byde on 25th August 2011 at NoSQLNow! in San Jose , California
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. Why Cassandra?
• Lots of data
– Copies of messages, reverse indices of
messages, per user data.
• Many incoming requests resulting in a lot
of random reads and random writes.
• No existing production ready solutions in
the market meet these requirements.
3. Design Goals
• High availability
• Eventual consistency
– trade-off strong consistency in favor of high
availability
• Incremental scalability
• Optimistic Replication
• “Knobs” to tune tradeoffs between consistency,
durability and latency
• Low total cost of ownership
• Minimal administration
4. Data Model Columns are
added and
ColumnFamily1 Name : MailList modified
Type : Simple Sort : Name
KEY Name : tid1 Name : tid2 Name : tid3 dynamically
Name : tid4
Value : <Binary> Value : <Binary> Value : <Binary> Value : <Binary>
TimeStamp : t1 TimeStamp : t2 TimeStamp : t3 TimeStamp : t4
ColumnFamily2 Name : WordList Type : Super Sort : Time
Column Families Name : aloha Name : dude
are declared
C1 C2 C3 C4 C2 C6
upfront
SuperColumns V1 V2 V3 V4 V2 V6
are added and T1 T2 T3 T4 T2 T6
modified
Columns are
dynamically
added and
modified ColumnFamily3 Name : System Type : Super Sort : Name
dynamically Name : hint1 Name : hint2 Name : hint3 Name : hint4
<Column List> <Column List> <Column List> <Column List>
5. Write Operations
• A client issues a write request to a random
node in the Cassandra cluster.
• The “Partitioner” determines the nodes
responsible for the data.
• Locally, write operations are logged and
then applied to an in-memory version.
• Commit log is stored on a dedicated disk
local to the machine.
6. Write cont’d
Key (CF1 , CF2 , CF3) • Data size
• Number of Objects
Memtable ( CF1)
• Lifetime
Commit Log Memtable ( CF2)
Binary serialized
Key ( CF1 , CF2 , CF3 ) Memtable ( CF2)
Data file on disk
<Key name><Size of key Data><Index of columns/supercolumns><
Serialized column family>
K128 Offset ---
---
K256 Offset BLOCK Index <Key Name> Offset, <Key Name> Offset
Dedicated Disk
K384 Offset ---
---
Bloom Filter <Key name><Size of key Data><Index of columns/supercolumns><
Serialized column family>
(Index in memory)
7. Compactions
K2 < Serialized data > K4 < Serialized data >
K1 < Serialized data >
K10 < Serialized data > K5 < Serialized data >
K2 < Serialized data >
K30 < Serialized data > K10 < Serialized data >
K3 < Serialized data >
DELETED
-- --
--
Sorted -- Sorted --
Sorted --
-- --
--
MERGE SORT
Index File
K1 < Serialized data >
Loaded in memory K2 < Serialized data >
K3 < Serialized data >
K1 Offset
K4 < Serialized data >
K5 Offset Sorted
K5 < Serialized data >
K30 Offset
K10 < Serialized data >
Bloom Filter
K30 < Serialized data >
Data File
8. Write Properties
• No locks in the critical path
• Sequential disk access
• Behaves like a write back Cache
• Append support without read ahead
• Atomicity guarantee for a key
• “Always Writable”
– accept writes during failure scenarios
9. Read
Client
Query Result
Cassandra Cluster
Closest replica Result Read repair if
digests differ
Replica A
Digest Query
Digest Response Digest Response
Replica B Replica C
11. Cluster Membership and Failure
Detection
• Gossip protocol is used for cluster membership.
• Super lightweight with mathematically provable properties.
• State disseminated in O(logN) rounds where N is the number of
nodes in the cluster.
• Every T seconds each member increments its heartbeat counter and
selects one other member to send its list to.
• A member merges the list with its own list .
12.
13.
14.
15.
16. Accrual Failure Detector
• Valuable for system management, replication, load balancing etc.
• Defined as a failure detector that outputs a value, PHI, associated
with each process.
• Also known as Adaptive Failure detectors - designed to adapt to
changing network conditions.
• The value output, PHI, represents a suspicion level.
• Applications set an appropriate threshold, trigger suspicions and
perform appropriate actions.
• In Cassandra the average time taken to detect a failure is 10-15
seconds with the PHI threshold set at 5.
17. Properties of the Failure Detector
• If a process p is faulty, the suspicion level
Φ(t) ∞as t ∞.
• If a process p is faulty, there is a time after which Φ(t) is monotonic
increasing.
• A process p is correct Φ(t) has an ub over an infinite execution.
• If process p is correct, then for any time T,
Φ(t) = 0 for t >= T.
18. Implementation
• PHI estimation is done in three phases
– Inter arrival times for each member are stored in a sampling
window.
– Estimate the distribution of the above inter arrival times.
– Gossip follows an exponential distribution.
– The value of PHI is now computed as follows:
• Φ(t) = -log10( P(tnow – tlast) )
where P(t) is the CDF of an exponential distribution. P(t) denotes the
probability that a heartbeat will arrive more than t units after the previous
one. P(t) = ( 1 – e-tλ )
The overall mechanism is described in the figure below.
20. Performance Benchmark
• Loading of data - limited by network
bandwidth.
• Read performance for Inbox Search in
production:
Search Interactions Term Search
Min 7.69 ms 7.78 ms
Median 15.69 ms 18.27 ms
Average 26.13 ms 44.41 ms
21. MySQL Comparison
• MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
• Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
22. Lessons Learnt
• Add fancy features only when absolutely
required.
• Many types of failures are possible.
• Big systems need proper systems-level
monitoring.
• Value simple designs
23. Future work
• Atomicity guarantees across multiple keys
• Analysis support via Map/Reduce
• Distributed transactions
• Compression support
• Granular security via ACL’s