Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Postgres-XC as a Key Value Store Compared To MongoDBMason Sharp
This presentation discusses how Postgres-XC can be used as a PostgreSQL-based key-value store using features like hstore and JSON. It also compares performance to MongoDB for a read workload
An overview of Postgres-XC is provided. Postgres-XC is a free, open source, PostgreSQL based write scalable cluster. It runs on multiple servers, is fully ACID and consistent. Postgres-XC is a traditional relational alternative to cases where NoSQL solutions are being considered.
This presentation was given in San Francisco August 7th by Mason Sharp, one of the original architects of Postgres-XC and co-founder of StormDB (http://www.stormdb.com).
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Postgres-XC as a Key Value Store Compared To MongoDBMason Sharp
This presentation discusses how Postgres-XC can be used as a PostgreSQL-based key-value store using features like hstore and JSON. It also compares performance to MongoDB for a read workload
An overview of Postgres-XC is provided. Postgres-XC is a free, open source, PostgreSQL based write scalable cluster. It runs on multiple servers, is fully ACID and consistent. Postgres-XC is a traditional relational alternative to cases where NoSQL solutions are being considered.
This presentation was given in San Francisco August 7th by Mason Sharp, one of the original architects of Postgres-XC and co-founder of StormDB (http://www.stormdb.com).
Scalar DB: A library that makes non-ACID databases ACID-compliantScalar, Inc.
Scalar DB is a library that makes non-ACID databases ACID-compliant. It not only supports strongly-consistent ACID transactions, but also scales linearly and achieves high availability when it is deployed with distributed databases such as Cassandra.
Newer version about JDK10 and 11 is here
https://www.slideshare.net/nowokay/summary-of-jdk10-and-what-will-come-into-jdk11-99363835
The material for the presentation of the JJUG CCC 2018 Spring
Slides of my talk at QCon Hangzhou 2011, on the things that my team has been doing on the JVM, esp. on customization. Video available at http://www.infoq.com/cn/presentations/ms-jvm-taobao
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Scalar DB: A library that makes non-ACID databases ACID-compliantScalar, Inc.
Scalar DB is a library that makes non-ACID databases ACID-compliant. It not only supports strongly-consistent ACID transactions, but also scales linearly and achieves high availability when it is deployed with distributed databases such as Cassandra.
Newer version about JDK10 and 11 is here
https://www.slideshare.net/nowokay/summary-of-jdk10-and-what-will-come-into-jdk11-99363835
The material for the presentation of the JJUG CCC 2018 Spring
Slides of my talk at QCon Hangzhou 2011, on the things that my team has been doing on the JVM, esp. on customization. Video available at http://www.infoq.com/cn/presentations/ms-jvm-taobao
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
Spark RDDs are almost identical to Scala collection, just in a distributed manner, all of the transformations and actions are derived from the Scala collections API.
As Martin Odersky mentioned, “Spark - The Ultimate Scala Collections” is the right way to look at RDDs. But with that great distributed power comes a great many data problems: at first you’ll start tackling the concept of partitioning, then the actual data becomes the next thing to worry about.
In the talk we’ll go through an overview on Spark's architecture, and see how similar RDDs are to the Scala collections API. We'll then shift to the world of problems that you’ll be facing when using Spark for processing a vast volume of time-series data with multiple data stores (S3, MongoDB, Apache Cassandra, MySQL).
When you start tackling many scale and performance problems, many questions arise:
> How to handle missing data?
> Should the system handle both serving and backend processes, or should we separate them out?
> Which solution is cheaper?
> How do we get the best performance for money spent?
In the talk we will tell the tale of all of the transformations we’ve made to our data and review the multiple data persistency layers... and I’ll try my best NOT to answer the question “which persistency layer is the best?” but I do promise to share our pains and lessons learned!
This is a talk about Netflix's path to Cassandra. The first few slides may look similar to previous presentations, but they are just to set the context. Most the content is brand new!
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems: How to handle missing data?Should I handle both serving and backed process or separating them out? Best Performance for Money? In the talk we will tell the tale of all of the transformations we’ve made to our data model@Windward, some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL. And I’ll try my best NOT to answer the question “Which one of them is the Best?"
In this session, we'll discuss new volume types in Red Hat Gluster Storage. We will talk about erasure codes and storage tiers, and how they can work together. Future directions will also be touched on, including rule based classifiers and data transformations.
You will learn about:
How erasure codes lower the cost of storage.
How to configure and manage an erasure coded volume.
How to tune Gluster and Linux to optimize erasure code performance.
Using erasure codes for archival workloads.
How to utilize an SSD inexpensively as a storage tier.
Gluster's erasure code and storage tiering design.
a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices
In this second part, we'll continue the Spark's review and introducing SparkSQL which allows to use data frames in Python, Java, and Scala; read and write data in a variety of structured formats; and query Big Data with SQL.
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
Presentation consists of an amazing bundle of Pro tips and tricks for building an insanely scalable Apache Spark and Spark Streaming based data pipeline.
Presentation consists of 4 parts:
* Quick intro to Spark
* N-billion rows/day system architecture
* Data Warehouse and Messaging
* How to deploy spark so it does not backfire
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
Scality CTO Giorgio Regni and Software Engineer Lauren Spiegel talk about the open source S3 clone, written in Node.js. This presentation was given at a meetup on September 1, 2016 in San Francisco.
Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati
Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O.
"Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.
This talk lays out a few talking points for Apache Cassandra.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
SF Java presentation of jvm goes to big data.
“Slowly yet surely the JVM is going to Big Data! In this fun filled presentation we see what pieces of Java & JVM triumph or unravel in the battle for performance at high scale!”
Concurrency is the currency of scale on multi-core & the new generation of collections and non-blocking hashmaps are well worth the time taking a deep dive into. We take a quick look at the next gen serialization techniques as well as implementation pitfalls around UUID. The achilles' heel for JVM remains Garbage Collection: a deep dive into the internals of the memory model, common GC algorithms and their tuning knobs is always a big draw. EC2 & cloud present us with a virtualized & unchartered territory for scaling the JVM.
We will leave some room for Q&A or fill it up with any asynchronous I/O that might queue up during the talk. A round of applause will be due to the various tools that are essentials for Java performance debugging.
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform.
Presented at RedHat Summit 2010, Boston
Speakers: SriSatish Ambati, Performance Engg
Manik Surtani, InfiniSpan Lead
Presentation details from RH Summit:
How to Stop Worrying & Start Caching in Java
SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc.
Manik Surtani — Principal Software Engineer, Red Hat
Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.
In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.
Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.
http://www.redhat.com/promo/summit/2010/sessions/jboss.html
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
JavaOne 2010.
Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on.
Speaker(s):
Cliff Click, Azul Systems, Distinguished Engineer
SriSatish Ambati, Azul Systems, Performance Engineer
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. A feather in the CAP
● Eventual Consistency
– Levels
– Doesn’t mean data loss
(journaled)
● SEDA
– Partitioning, Cluster &
Failure detection,
Storage engine mod
– Event driven & non
blocking io
– Pure Java
6. State machines as way to
visualize distributed systems
In a distributed system
● sometimes impossible to say that one of
two events occurred first.
● The relation “happened before” → is only a
partial ordering of event
For any events a, b and Clock C:
– If a → b then C(a) < C(b)
Leslie Lamport.
7. client side consistency
● A writes. B & C read after time t. Inconsistency Window
● Types of “eventual consistency”
● Causal Consistency
– A in a causal B sees consistent data
● ReadyourWrites
– A only sees updated valuse of data
● Session Consistency
– In the context of a session
● Monotonic read
– If a process sees a new value:everyone else
guaranteed to see update.
● Monotonic Write
8. Server side consistency:
R + W > N
● N the number of nodes that store replicas of the
data (RF)
● W the number of replicas that need to
acknowledge the receipt of the update before the
update completes
● R the number of replicas that are contacted
when a data object is accessed through a read
operation
T is the number of nodes.
9. Quorum
● R + W > N
– R=1, W=2, N=2 (primarybackup, rdbms)
● Faulttolerance
– R=2, W=2, N=3
● Read performance:
– R=1, Lots of nodes. Even 100s.
– R=1, N=W. High consistency
● Fast write:
– W=1, N=R
10. Count what is countable, measure what is
measurable, and what is not measurable,
make measurable
-Galileo
11. Elements of Cache Performance:
Metrics
● Operations:
– Ops/s: Puts/sec, Gets/sec, updates/sec
– Latencies, percentiles
– Indexing
● # of nodes – scale, elasticity
● Replication
– Synchronous, Asynchronous (fast writes)
● Tuneable Consistency
● Durability/Persistence
● Size & Number of Objects, Size of Cache
● # of user clients
12. Elements of Cache Performance:
“Think Locality”
● Hot or Not: The 80/20 rule.
– A small set of objects are very popular!
– What is the most RT tweet?
● Hit or Miss: Hit Ratio
– How effective is your cache?
– LRU, LFU, FIFO.. Expiration
● Longlived objects lead to better locality.
● Spikes happen
– Cascading events
– Cache Thrash: full table scans
13. Real World Performance
● Facebook Inbox
– Writes:0.12ms, Reads:15ms @ 50GB data
● Twitter performance
– Twissandra (simulation)
● Cassandra for Search & Portals
– Lucandra, solandra (simulation)
● ycbs/PNUTS benchmarks
– 5ms read/writes @ 5k ops/s (50/50 Update heavy)
– 8ms reads/5ms writes @ 5k ops/s (95/5 read heavy)
● Lab environment
– ~5k writes per sec per node, <5ms latencies
– ~10k reads per sec per node, <5ms latencies
● Performance has improved in newer versions
16. JVM in BigData Land!
Limits for scale
●
Locks : synchronized
– Can’t use all my multicores!
– java.util.collections also hold locks
– Use nonblocking collections!
● (de)Serialization is expensive
– Hampers object portability
– Use avro, thrift!
● Object overhead
– average enterprise collection has 3 elements!
– Use byte[ ], primitives where possible!
● Garbage Collection
– Can’t throw memory at the problem!
– Mitigate, Monitor, Measure foot print
17. Tools
● What is the JVM doing:
– dtrace, hprof, introscope, jconsole,
visualvm, yourkit, azul zvision
● Invasive JVM observation tools
– bci, jvmti, jvmdi/pi agents, jmx, logging
● What is the OS doing:
– dtrace, oprofile, vtune
● What is the network disk doing:
– Ganglia, iostat, lsof, netstat, nagios
18. furiously fast writes
n2 l y t o
app ry
client o
issues mem
n1
write find n o d e
commit log
partitioner
●
Append only writes
– Sequential disk access
●
No locks in critical path
●
Key based atomicity
19. furiously fast writes
● Use separate disks for commitlog
– Don’t forget to size them well
– Isolation difficult in the cloud..
● Memtable/SSTable sizes
– Delicately balanced with GC
● memtable_throughput_in_mb
23. Compactions
K2 < Serialized data > K4 < Serialized data >
K1 < Serialized data >
K10 < Serialized data > K5 < Serialized data >
K2 < Serialized data >
K30 < Serialized data > K10 < Serialized data >
K3 < Serialized data >
D E L E T E D
Sorted Sorted
Sorted
MERGE SORT
Index File
K1 < Serialized data >
Loaded in memory K2 < Serialized data >
K3 < Serialized data >
K1 Offset
K4 < Serialized data >
K5 Offset Sorted
K5 < Serialized data >
K30 Offset
K10 < Serialized data >
Bloom Filter
K30 < Serialized data >
Data File
24. Compactions
● Intense disk io & mem churn
● Triggers GC for tombstones
● Minor/Major Compactions
● Reduce priority for better reads
Relevant Parameters
Co m p a c t io n Ma n a g e r.
m in im u m Co m p a c t io n Th re s h o ld = xxxx
27. reads performance
● BloomFilter used to identify the right file
● Maintain column indices to look up columns
– Which can span different SSTables
● Less io than typical btree
● Cold read: Two seeks
– One for Key lookup, another row lookup
● Key Cache
– Optimized in latest cassandra
● Row Cache
– Improves read performance
– GC sensitive for large rows.
● Most (google) applications require single row
transactions*
*Sanjay G, BigTable Design, Google.
28. Client Performance
Marshal Arts:
Ser/Deserialization
● Clients dominated by Thrift, Avro
– Hector, Pelops
● Thrift: upgrade to latest: 0.5, 0.4
● No news: java.io.Serializable is S.L..O.…W
● Use “transient”
● avro, thrift, protobuf
● Common Patterns of Doom:
– Death by a million gets
30. Adding Nodes
●
New nodes
– Add themselves to busiest node
– And then Split its Range
●
Busy Node starts transmit to new node
●
Bootstrap logic initiated from any node, cli, web
●
Each node capable of ~40MB/s
– Multiple replicas to parallelize bootstrap
●
UDP for control messages
●
TCP for request routing
32. Bloom Filter: in full bloom
● “constant” time
● size:compact
● false positives
● Single lookup
for key in file
● Deletion
● Improve
– Counting BF
– Bloomier filters
33. Birthdays, Collisions &
Hashing functions
● Birthday Paradox
For the N=21 people in this room
Probability that at least 2 of them share same birthday is
~0.47
● Collisions are real!
● An unbalanced HashMap behaves like a list O(n) retrieval
● Chaining & Linear probing
● Performance Degrades
● with 80% table density
●
42. U U I D
● java.util.UUID is slow
– static use leads to contention
SecureRandom
● Uses /dev/urandom for seed initialization
-Djava.security.egd=file:/dev/urandom
● PRNG without file is atleast 20%40% better.
● Use TimeUUIDs where possible – much faster
● JUG – java.uuid.generator
● http://github.com/cowtowncoder/javauuidgenerator
● http://jug.safehaus.org/
● http://johannburkard.de/blog/programming/java/JavaUUIDgeneratorscompared.html
43. synchronized
● Coarse grained locks
● io under lock
● Stop signal on a highway
● java.util.concurrent does not mean no
locks
● Non Blocking, Lock free, Wait free
collections
44. Scalable LockFree Coding Style
● Big Array to hold Data
● Concurrent writes via: CAS & Finite State
Machine
– No locks, no volatile
– Much faster than locking under heavy load
– Directly reach main data array in 1 step
● Resize as needed
– Copy Array to a larger Array on demand
– Use State Machine to help copy
– “ Mark” old Array words to avoid missing late updates
46. Cassandra uses High Scale
NonBlocking Hashmap
public class BinaryMemtable implements IFlushable
{
…
private final Map<DecoratedKey,byte[]> columnFamilies =
new NonBlockingHashMap<DecoratedKey, byte[]>();
/* Lock and Condition for notifying new clients about Memtable
switches */
private final Lock lock = new ReentrantLock(); Condition condition;
…
}
public class Table
{
…
private static final Map<String, Table> instances = new
NonBlockingHashMap<String, Table>();
…
}
47. GCsensitive elements
within Cassandra
● Compaction triggers System.gc()
– Tombstones from files
● “GCInspector”
● Memtable Threshold, sizes
● SSTable sizes
● Low overhead collection choices
48. Garbage Collection
● Pause Times
if stop_the_word_FullGC > ttl_of_node
=> failed requests; failure accrual & node repair.
● Allocation Rate
– New object creation, insertion rate
● Live Objects (residency)
– if residency in heap > 50%
– GC overheads dominate.
● Overhead
– space, cpu cycles spent GC
● 64bit not addressing pause times
– Bigger is not better!
49. Memory Fragmentation
● Fragmentation
– Performance degrades over time
– Inducing “Full GC” makes problem go away
– Free memory that cannot be used
● Reduce occurrence
– Use a compacting collector
– Promote less often
– Use uniform sized objects
● Solution – unsolved
– Use latest CMS with CR:6631166
– Azul’s Zing JVM & Pauseless GC
51. Best Practices:
Garbage Collection
● GC Logs are cheap even in production
Xloggc:/var/log/cassandra/gc.log
XX:+PrintGCDetails
XX:+PrintGCTimeStamps XX:+PrintTenuringDistribution
XX:+PrintHeapAtGC
● Slightly expensive ones:
XX:PrintFLSStatistics=2 XX:CMSStatistics=1
XX:CMSInitiationStatistics
52. Sizing: Young
Generation
● Should we set –Xms == Xmx ?
● Use –Xmn (fixed eden)
allocations {new Object();}
survivor ratio
eden survivor spaces
Tenuring
promotion
Threshold
allocation by jvm
old generation
53. Tuning CMS
● Don’t promote too often!
– Frequent promotion causes fragmentation
● Size the generations
– Min GC times are a function of Live Set
– Old Gen should host steady state comfortably
● Parallelize on multicores:
– XX:ParallelCMSThreads=4
– XX:ParallelGCThreads=4
● Avoid CMS Initiating heuristic
– XX:+UseCMSInitiationOccupanyOnly
● Use Concurrent for System.gc()
– XX:+ExplicitGCInvokesConcurrent
54. Summary
Design & Implementation of Cassandra takes advantages
of strengths while avoiding common JVM issues.
● Locks:
– Avoids locks in critical path
– Uses nonblocking collections, TimeUUIDs!
– Still Can’t use all my multicores..?
>> Other bottlenecks to find!
● De/Serialization:
– Uses avro, thrift!
● Object overhead
– Uses mostly byte[ ], primitives where possible!
● Garbage Collection
– Mitigate: Monitor, Measure foot print.
– Work in progress by all jvm vendors!
Cassandra starts from a great footing from a JVM standpoint
and will reap the benefits of the platform!
55. References Q&A
● Leslie Lamport, “Time, Clocks and the Ordering of Events in a Distributed
System”
● Verner Wogels, Eventually Consistent
http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
● Bloom, Burton H. (1970), "Space/time tradeoffs in hash coding with
allowable errors"
● Avinash Lakshman, http://static.last.fm/johan/nosql
20090611/cassandra_nosql.pdf
● Eric Brewer, CAP
http://www.cs.berkeley.edu/~brewer/cs262b2004/PODCkeynote.pdf
● Tony Printzeis, Charlie Hunt, Javaone Talk
http://www.scribd.com/doc/36090475/GCTuningintheJava
● http://github.com/digitalreasoning/PyStratus/wiki/Documentation
● http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf
● Cassandra on Cloud, http://www.coreyhulen.org/?p=326
● Cliff Click’s, Nonblocking HashMap http://sourceforge.net/projects/highscale
lib/
● Brian F. Cooper., Yahoo Cloud Storage Benchmark,
http://www.brianfrankcooper.net/pubs/ycsbv4.pdf
56. DataModel:
Know your use patterns
Alternative Twitter DataModel:
<Keyspace Name="Multiblog">
<ColumnFamily CompareWith="TimeUUIDType" Name="Blogs" />
<ColumnFamily CompareWith="TimeUUIDType" Name="Comments"/>
</Keyspace>