This document summarizes the results of experiments testing Cassandra storage performance on Amazon EC2 using ephemeral storage and EBS. Ephemeral storage showed higher seek and insert performance but data is lost when instances terminate. EBS performance scaled linearly with devices but was limited to around 100 IOPS per device. The author recommends using RAID 0 on ephemeral disks for best performance but notes data durability risks and need to repeat tests extensively.
J Kumar Infraprojects is one of the leading construction companies in India with nearly three decades of experience. It has a current order backlog of Rs. 13 billion diversified across various segments like transportation, civil works, irrigation and piling. The company is well positioned to benefit from the large infrastructure projects underway in India. Kotak Securities initiates coverage on J Kumar with a "Buy" rating and a target price of Rs. 260 per share given its strong growth outlook and attractive valuations.
The document describes Acunu's data platform, which reengineers the storage stack in the Linux kernel to take advantage of massive datasets and scale-out commodity hardware. Acunu provides high performance storage solutions for NoSQL databases like Apache Cassandra through its open source Acunu Storage Core. It also allows multiple data stores to interoperate on the same data through versioning and isolation tools. Acunu's research has led to innovations like fast full data versioning and data structures optimized for SSD performance.
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
Malcolm Box discusses Tellybug's experience using Cassandra to power voting applications for reality TV shows like Britain's Got Talent and The X Factor. They started with Cassandra to handle high write loads from millions of votes but found counting to be more challenging than expected. They implemented sharded counters in Memcached with Cassandra as the source of truth. While Cassandra scaled well for writes, reads had performance issues. Backup and data integrity also presented operational challenges as their usage of Cassandra evolved.
Pradeep Kumar has over 3 years of experience as an SAP FI/CO functional consultant. He has an MBA in Finance and is SAP certified in Financial Accounting. At his current role at HCL Technologies, he provides support and configurations for clients like Vestas based on their business requirements. Previously he worked as an accounts executive and process associate performing tasks like bookkeeping, bank reconciliation, and financial reporting.
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
The document discusses NoSQL, NewSQL, and other database technologies that are emerging to address limitations of relational databases in scaling to meet demands for performance, availability, and flexibility. It provides an overview of different categories of NoSQL databases and NewSQL solutions, and analyzes drivers like scalability, performance, relaxed consistency, agility, and complexity of data that are contributing to adoption of these new database approaches.
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/
Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.
Cassandra uses commit logs, memtables, and SSTables to handle writes efficiently. Commit logs store writes sequentially, memtables buffer in-memory, and SSTables store sorted, compressed data files on disk. For reads, Cassandra uses bloom filters and indexes to locate keys in memtables and SSTables, then retrieves the data. Compaction merges SSTables to improve performance and remove obsolete data. Snapshots and repair use Merkle trees to backup consistent data sets and repair differences between nodes.
This document provides an introduction to cloud computing and Amazon Web Services. It discusses the three categories of cloud services: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). It focuses on IaaS and describes Amazon EC2 for computing resources and EBS for storage. It also introduces StarCluster, an open source tool that simplifies the creation and management of EC2 clusters running Hadoop, MPI and other applications.
J Kumar Infraprojects is one of the leading construction companies in India with nearly three decades of experience. It has a current order backlog of Rs. 13 billion diversified across various segments like transportation, civil works, irrigation and piling. The company is well positioned to benefit from the large infrastructure projects underway in India. Kotak Securities initiates coverage on J Kumar with a "Buy" rating and a target price of Rs. 260 per share given its strong growth outlook and attractive valuations.
The document describes Acunu's data platform, which reengineers the storage stack in the Linux kernel to take advantage of massive datasets and scale-out commodity hardware. Acunu provides high performance storage solutions for NoSQL databases like Apache Cassandra through its open source Acunu Storage Core. It also allows multiple data stores to interoperate on the same data through versioning and isolation tools. Acunu's research has led to innovations like fast full data versioning and data structures optimized for SSD performance.
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
Malcolm Box discusses Tellybug's experience using Cassandra to power voting applications for reality TV shows like Britain's Got Talent and The X Factor. They started with Cassandra to handle high write loads from millions of votes but found counting to be more challenging than expected. They implemented sharded counters in Memcached with Cassandra as the source of truth. While Cassandra scaled well for writes, reads had performance issues. Backup and data integrity also presented operational challenges as their usage of Cassandra evolved.
Pradeep Kumar has over 3 years of experience as an SAP FI/CO functional consultant. He has an MBA in Finance and is SAP certified in Financial Accounting. At his current role at HCL Technologies, he provides support and configurations for clients like Vestas based on their business requirements. Previously he worked as an accounts executive and process associate performing tasks like bookkeeping, bank reconciliation, and financial reporting.
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
The document discusses NoSQL, NewSQL, and other database technologies that are emerging to address limitations of relational databases in scaling to meet demands for performance, availability, and flexibility. It provides an overview of different categories of NoSQL databases and NewSQL solutions, and analyzes drivers like scalability, performance, relaxed consistency, agility, and complexity of data that are contributing to adoption of these new database approaches.
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/
Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.
Cassandra uses commit logs, memtables, and SSTables to handle writes efficiently. Commit logs store writes sequentially, memtables buffer in-memory, and SSTables store sorted, compressed data files on disk. For reads, Cassandra uses bloom filters and indexes to locate keys in memtables and SSTables, then retrieves the data. Compaction merges SSTables to improve performance and remove obsolete data. Snapshots and repair use Merkle trees to backup consistent data sets and repair differences between nodes.
This document provides an introduction to cloud computing and Amazon Web Services. It discusses the three categories of cloud services: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). It focuses on IaaS and describes Amazon EC2 for computing resources and EBS for storage. It also introduces StarCluster, an open source tool that simplifies the creation and management of EC2 clusters running Hadoop, MPI and other applications.
Creating cluster 'mycluster' with the following settings:
- Master node: m1.small using ami-fce3c696
- Number of nodes: 1
- Node type: m1.small
- Node AMI: ami-fce3c696
- Storage: EBS volume of size 10 GB
- Security group: mycluster-sg allowing SSH from anywhere
Launching instances...
This may take a few minutes. You can check progress with 'starcluster list'.
When instances have started, SSH will be automatically configured.
You can now ssh to the master with:
starcluster ssh mycluster
Have fun and please let us know if you have
This document provides an overview of a Devops workshop that will take place on June 13, 2011. The workshop will be instructed by John Willis and will cover Devops goals, case studies, culture, automation, and measurement over the course of 6 units. It also includes slides soliciting feedback from students on their goals and understanding of Devops.
The document discusses the challenges of exascale computing architectures and how active storage strategies can help address them. Exascale systems will require three orders of magnitude more processing power, storage, bandwidth, and other resources compared to today's systems. Simply scaling up current architectures may not be feasible or cost effective. Active storage, where computation is brought to the data rather than moving large amounts of data, can help optimize the use of resources. The Blue Gene architecture demonstrates advantages like high memory and network bandwidth capacity that are well-suited for active storage approaches needed at the exascale.
HP Cloud Services conducted performance testing on various VM configurations provided by OpenStack. Benchmark tests were run including byte-unixbench, mbw, iozone, iperf, pgbench and Hadoop wordcount. The results showed the larger VM configurations generally had better performance, but some defects were discovered in 7 out of 20 test VMs, indicating the defect rate was too high for production use. While defects were not directly related to OpenStack, the conclusions were that OpenStack still lacks functionality for production and building a full IaaS service is more complex than the software alone.
This document compares the storage performance of the EMC VNX 5100 and IBM DS5300 storage systems. Key findings include:
- The IBM DS5300 has more memory per controller (8GB vs 4GB for EMC VNX 5100) which impacts caching.
- Testing of RAID 10, 5 and 6 configurations show generally superior random I/O performance for EMC VNX 5100, while IBM DS5300 performs better on sequential reads/writes.
- Adding SSD caching to EMC VNX 5100 via FAST Cache significantly boosts performance, with SSDs achieving over 20,000 IOPS.
This document provides an overview of flash storage technologies for databases. It discusses NAND and NOR flash, major flash storage vendors like Fusion-io and Virident, and how flash compares to traditional hard disks. It also covers best practices for using flash with databases like MySQL, including placing data files on flash for faster random reads and logs on RAID for sequential writes. Overall flash provides significantly higher IOPS, lower power consumption, and more consistent performance than hard disks.
Learn tips and techniques that will improve the performance of your applications and databases running on Amazon EC2 instance storage and/or Amazon Elastic Block Store (EBS). This advanced session discusses when to use HI1, HS1, and Amazon EBS. We will share an "under the hood" view to tune the performance of your Elastic Block Store and best practices for running workloads on Amazon EBS, such as relational databases (MySQL, Oracle, SQL Server, Postgres) and NoSQL data stores, such as MongoDB and Riak.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...Amazon Web Services
This document discusses optimizing performance for EC2 instances and EBS volumes. It provides guidance on provisioning IOPS for different types of storage workloads and database software. The key recommendations are to use EBS-optimized instances with Provisioned IOPS (PIOPS) volumes for random I/O workloads like databases, size volumes appropriately based on the needed IOPS and throughput, and architect for consistent low latency by adjusting the queue depth.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss Amazon EBS encryption and share best practices for Amazon EBS snapshot management. Throughout, we share tips for success.
This document discusses software transactional memory (STM) as an approach for concurrent programming. It notes that as processors have moved to multiple cores, single-threaded applications are no longer sufficient. STM aims to make concurrent programming easier by allowing operations to be grouped into atomic transactions that guarantee consistency. The document outlines some current models like locks and actors, and notes STM implementations guarantee avoiding issues like deadlocks, livelocks, and race conditions. It also discusses some challenges with STM like retry waste and lack of tools, and mentions Clojure and Haskell use immutable data and STM for concurrency.
SmugMug uses a variety of technologies and strategies to optimize performance for their photo sharing website. They rely heavily on MySQL databases stored on high-performance SSD storage arrays. They also leverage content delivery networks, caching, and database replication. Their use of ZFS storage has improved performance and reliability compared to their previous filesystems.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we discuss how to maximize Amazon EBS performance, with a special eye toward low-latency, high-throughput applications like databases. We explain how to monitor your application and share real-world examples.
Riken's Fujitsu K computer is the world's fastest supercomputer, with a peak performance of over 11 petaflops. It uses a homogeneous architecture of over 700,000 SPARC64 VIIIfx processors connected via a high-speed interconnect. Looking ahead, future exascale supercomputers in the 2018 timeframe are projected to have over 1 exaflop of peak performance, use over 1 billion processing cores, and consume around 20 megawatts of power. Significant technological advancements will be required across hardware and software to achieve exascale capabilities.
As storage capacities increase dramatically over the next 5 years, the document predicts several consequences: 1) Disks will replace tapes as the preferred archive media due to lower costs per terabyte of storage. 2) RAID10 configurations, which use mirroring, will replace RAID5, which uses parity, because higher performance will be needed to access very large disks. 3) Disks themselves will be packaged in "disc packs" with multiple read/write arms to provide higher bandwidth and access rates for extremely large single disks.
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
This document discusses how Emulex 16Gb Fibre Channel HBAs can provide better I/O performance in VMware vSphere 5.1 environments. It begins with an agenda and overview of new vSphere 5.1 storage features like space efficient sparse disks. Performance tests show the Emulex 16GFC HBA provides twice the throughput of 8GFC with lower CPU usage. The 16GFC HBA can achieve wire speed for random I/Os and support more VMs and higher IOPS. Best practices are discussed for using 16GFC HBAs, and the OneCommand Manager tool allows managing Emulex adapters directly from vCenter. Resources like the Implementers Lab website are provided.
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
This webinar covers the improvements in storage I/O throughput and CPU efficiency that VMware vSphere gains when using an Emulex 16Gb Fibre Channel Host Bus Adapter (HBA) versus the previous generation HBA. Applications virtualized on VMware vSphere 5.1 that generate storage I/O of various block sizes can take full advantage of 16Gb Fibre Channel wire speed for better sequential and random I/O performance.
The document summarizes performance optimizations for Apache Camel by discussing various Enterprise Integration Patterns (EIPs) and components. It provides demonstrations and conclusions for content-based routing, splitting messages, marshaling and unmarshaling data, working with files and databases, using threads, templates, web services, and messaging. The overall goal is to help optimize Apache Camel implementations through techniques like reducing file/database accesses, using string builders, batch processing, and parallelizing work with threads.
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
More Related Content
Similar to Storage on EC2 (& Cassandra), Cassandra Workshop, Berlin Buzzwords
Creating cluster 'mycluster' with the following settings:
- Master node: m1.small using ami-fce3c696
- Number of nodes: 1
- Node type: m1.small
- Node AMI: ami-fce3c696
- Storage: EBS volume of size 10 GB
- Security group: mycluster-sg allowing SSH from anywhere
Launching instances...
This may take a few minutes. You can check progress with 'starcluster list'.
When instances have started, SSH will be automatically configured.
You can now ssh to the master with:
starcluster ssh mycluster
Have fun and please let us know if you have
This document provides an overview of a Devops workshop that will take place on June 13, 2011. The workshop will be instructed by John Willis and will cover Devops goals, case studies, culture, automation, and measurement over the course of 6 units. It also includes slides soliciting feedback from students on their goals and understanding of Devops.
The document discusses the challenges of exascale computing architectures and how active storage strategies can help address them. Exascale systems will require three orders of magnitude more processing power, storage, bandwidth, and other resources compared to today's systems. Simply scaling up current architectures may not be feasible or cost effective. Active storage, where computation is brought to the data rather than moving large amounts of data, can help optimize the use of resources. The Blue Gene architecture demonstrates advantages like high memory and network bandwidth capacity that are well-suited for active storage approaches needed at the exascale.
HP Cloud Services conducted performance testing on various VM configurations provided by OpenStack. Benchmark tests were run including byte-unixbench, mbw, iozone, iperf, pgbench and Hadoop wordcount. The results showed the larger VM configurations generally had better performance, but some defects were discovered in 7 out of 20 test VMs, indicating the defect rate was too high for production use. While defects were not directly related to OpenStack, the conclusions were that OpenStack still lacks functionality for production and building a full IaaS service is more complex than the software alone.
This document compares the storage performance of the EMC VNX 5100 and IBM DS5300 storage systems. Key findings include:
- The IBM DS5300 has more memory per controller (8GB vs 4GB for EMC VNX 5100) which impacts caching.
- Testing of RAID 10, 5 and 6 configurations show generally superior random I/O performance for EMC VNX 5100, while IBM DS5300 performs better on sequential reads/writes.
- Adding SSD caching to EMC VNX 5100 via FAST Cache significantly boosts performance, with SSDs achieving over 20,000 IOPS.
This document provides an overview of flash storage technologies for databases. It discusses NAND and NOR flash, major flash storage vendors like Fusion-io and Virident, and how flash compares to traditional hard disks. It also covers best practices for using flash with databases like MySQL, including placing data files on flash for faster random reads and logs on RAID for sequential writes. Overall flash provides significantly higher IOPS, lower power consumption, and more consistent performance than hard disks.
Learn tips and techniques that will improve the performance of your applications and databases running on Amazon EC2 instance storage and/or Amazon Elastic Block Store (EBS). This advanced session discusses when to use HI1, HS1, and Amazon EBS. We will share an "under the hood" view to tune the performance of your Elastic Block Store and best practices for running workloads on Amazon EBS, such as relational databases (MySQL, Oracle, SQL Server, Postgres) and NoSQL data stores, such as MongoDB and Riak.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...Amazon Web Services
This document discusses optimizing performance for EC2 instances and EBS volumes. It provides guidance on provisioning IOPS for different types of storage workloads and database software. The key recommendations are to use EBS-optimized instances with Provisioned IOPS (PIOPS) volumes for random I/O workloads like databases, size volumes appropriately based on the needed IOPS and throughput, and architect for consistent low latency by adjusting the queue depth.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we conduct a detailed analysis of the differences among the three types of Amazon EBS block storage: General Purpose (SSD), Provisioned IOPS (SSD), and Magnetic. We discuss how to maximize Amazon EBS performance, with a special eye towards low-latency, high-throughput applications like databases. We discuss Amazon EBS encryption and share best practices for Amazon EBS snapshot management. Throughout, we share tips for success.
This document discusses software transactional memory (STM) as an approach for concurrent programming. It notes that as processors have moved to multiple cores, single-threaded applications are no longer sufficient. STM aims to make concurrent programming easier by allowing operations to be grouped into atomic transactions that guarantee consistency. The document outlines some current models like locks and actors, and notes STM implementations guarantee avoiding issues like deadlocks, livelocks, and race conditions. It also discusses some challenges with STM like retry waste and lack of tools, and mentions Clojure and Haskell use immutable data and STM for concurrency.
SmugMug uses a variety of technologies and strategies to optimize performance for their photo sharing website. They rely heavily on MySQL databases stored on high-performance SSD storage arrays. They also leverage content delivery networks, caching, and database replication. Their use of ZFS storage has improved performance and reliability compared to their previous filesystems.
Amazon Elastic Block Store (Amazon EBS) provides persistent block level storage volumes for use with Amazon EC2 instances. In this technical session, we discuss how to maximize Amazon EBS performance, with a special eye toward low-latency, high-throughput applications like databases. We explain how to monitor your application and share real-world examples.
Riken's Fujitsu K computer is the world's fastest supercomputer, with a peak performance of over 11 petaflops. It uses a homogeneous architecture of over 700,000 SPARC64 VIIIfx processors connected via a high-speed interconnect. Looking ahead, future exascale supercomputers in the 2018 timeframe are projected to have over 1 exaflop of peak performance, use over 1 billion processing cores, and consume around 20 megawatts of power. Significant technological advancements will be required across hardware and software to achieve exascale capabilities.
As storage capacities increase dramatically over the next 5 years, the document predicts several consequences: 1) Disks will replace tapes as the preferred archive media due to lower costs per terabyte of storage. 2) RAID10 configurations, which use mirroring, will replace RAID5, which uses parity, because higher performance will be needed to access very large disks. 3) Disks themselves will be packaged in "disc packs" with multiple read/write arms to provide higher bandwidth and access rates for extremely large single disks.
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
This document discusses how Emulex 16Gb Fibre Channel HBAs can provide better I/O performance in VMware vSphere 5.1 environments. It begins with an agenda and overview of new vSphere 5.1 storage features like space efficient sparse disks. Performance tests show the Emulex 16GFC HBA provides twice the throughput of 8GFC with lower CPU usage. The 16GFC HBA can achieve wire speed for random I/Os and support more VMs and higher IOPS. Best practices are discussed for using 16GFC HBAs, and the OneCommand Manager tool allows managing Emulex adapters directly from vCenter. Resources like the Implementers Lab website are provided.
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
This webinar covers the improvements in storage I/O throughput and CPU efficiency that VMware vSphere gains when using an Emulex 16Gb Fibre Channel Host Bus Adapter (HBA) versus the previous generation HBA. Applications virtualized on VMware vSphere 5.1 that generate storage I/O of various block sizes can take full advantage of 16Gb Fibre Channel wire speed for better sequential and random I/O performance.
The document summarizes performance optimizations for Apache Camel by discussing various Enterprise Integration Patterns (EIPs) and components. It provides demonstrations and conclusions for content-based routing, splitting messages, marshaling and unmarshaling data, working with files and databases, using threads, templates, web services, and messaging. The overall goal is to help optimize Apache Camel implementations through techniques like reducing file/database accesses, using string builders, batch processing, and parallelizing work with threads.
Similar to Storage on EC2 (& Cassandra), Cassandra Workshop, Berlin Buzzwords (20)
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!
Understanding Cassandra internals to solve real-world problemsAcunu
The document summarizes Nicolas Favre-Felix's presentation on Cassandra internals at a Cassandra London meetup. It discusses four common problems encountered with Cassandra - high read latency, high CPU usage with little activity, long nodetool repair times, and optimizing write throughput. For each problem, it describes symptoms, analysis using tools like nodetool, and solutions like adjusting the data model, increasing thread pool sizes, and adding hardware resources. The key takeaways are that monitoring Cassandra is important, using the right data model impacts performance, and understanding how Cassandra stores and arranges data on disk is essential to optimization.
The document describes how Apache Cassandra can be used for real-time analytics on streaming data. It provides an example of counting Twitter mentions of a term per day in real-time by incrementing counters in Cassandra as tweets are processed. This allows queries to be answered by reading the counters. More complex queries can be supported by storing aggregated data in a denormalized format across rows and columns in Cassandra.
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
The document discusses implementing real-time analytics on Twitter data using Cassandra. It describes incrementing counters for each tweet to track token frequencies over time. This allows querying token mentions within a date range by reading the relevant counter columns. However, Cassandra's random partitioner prevents efficient range queries on rows. Instead, the solution denormalizes the data into wide rows with time buckets as columns to allow fast counting of token mentions within each time period through a single disk read. The document provides code examples and encourages experimenting with an open source implementation.
This document discusses real-time analytics with Cassandra. It includes sections on motivation/alternatives, what real-time analytics with Cassandra is, how it works, approximate analytics, and what problems it can help solve. The document contains log data as an example of the type of data that can be analyzed with this technique.
- The document discusses Acunu Analytics, a real-time big data analytics platform.
- It addresses the motivation for developing Acunu Analytics compared to alternatives. It also briefly describes what Acunu Analytics is, how it works, and what problems it can help solve.
- The main topics covered are the product itself, its capabilities for real-time analytics of big data, and potential use cases.
Realtime Analytics on the Twitter Firehose with CassandraAcunu
This document discusses using Cassandra for real-time analytics of Twitter data. It describes incrementing counters in Cassandra as tweets are processed to track metrics like mentions over time. This allows queries to retrieve trends by reading counters with a single I/O, rather than scanning large amounts of data. The document demonstrates preparing tweet data by tokenizing and incrementing counters in time buckets. It also covers implementing a range query to retrieve mentions between dates from a wide row with time buckets as columns.
This document discusses a distributed database called Acunu that is tunably consistent, highly available, and partition tolerant. It can scale out on commodity servers and provides high performance. The database uses a multi-master architecture without single points of failure and supports data replication across multiple data centers. It also provides a simple but powerful data model and is well-suited for applications involving high-velocity data.
Acunu is developing an enterprise Cassandra appliance called Castle that aims to simplify Cassandra deployment and management. Castle includes a storage engine optimized for large disks and workloads, and allows for high density on commodity hardware. It also features fast disk rebuilds through its shared memory architecture. Acunu provides a web UI called the Control Center to configure, monitor, and troubleshoot Castle without deep Cassandra expertise. Acunu performs extensive automated testing of Castle to ensure reliability.
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
The document discusses the history and development of Cassandra Query Language (CQL), which provides an SQL-like interface for querying Apache Cassandra databases. It describes CQL evolving from versions 1.0 through 3.0 to become more standardized and user-friendly. Key points include CQL initially being introduced in Cassandra 0.8 to replace the low-level Thrift API, its goals of being simple, intuitive, and high performing, and ongoing work to improve its interface stability and driver support across languages.
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
The document discusses Cassandra's storage internals. It describes how Cassandra writes data to memtables and commit logs in memory before flushing to immutable SSTables on disk. It also explains how compaction merges SSTables to reclaim space and improve performance. For reads, Cassandra uses memtables, bloom filters on SSTables, key caches, and row caches to minimize disk I/O. Counters are implemented by coordinating writes across replicas.
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Acunu
This document summarizes a presentation about Cassandra's highly available distributed data model. The presentation covers Cassandra's key capabilities of scalability, fault tolerance, tunable consistency, and replication without single points of failure. It discusses Cassandra's use of consistent hashing to partition and place data across nodes, as well as its replication strategies and consistency levels that allow tuning availability versus consistency.
Cassandra EU 2012 - Data modelling workshop by Richard LowAcunu
This document summarizes Richard Low's upcoming data modeling workshop. The workshop will cover what data modeling is, factors to consider when designing a data model like workload and queries, modeling options in Cassandra like rows and columns, and tools like counters and secondary indexes. It provides an example of modeling a scalable messaging application and compares it to a relational database model. The workshop aims to help attendees optimize their data for common queries and operations.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Storage on EC2 (& Cassandra), Cassandra Workshop, Berlin Buzzwords
1. Storage on EC2
(& Cassandra)
Tom Wilkie
Cassandra Workshop 8/06/11
Wednesday, 8 June 2011
2. ACHTUNG!
Data only collected over
past 5 days
Didn’t repeat experiments
(that much)
EC2 is a moving target
Wednesday, 8 June 2011
3. Consider: Not considering:
• Ephemeral vs EBS • Cluster Performance
• ... vs Instance Type • Internode latency,
throughput
• ... vs RAID level
• Tuning...
• ... vs # threads
ES ...
• (...vs storage engine) D F A I L UR
A TE EL
C OR R
Wednesday, 8 June 2011
6. [ih-fem-er-uhl] Show IPA –adjective
1. lasting a very short time; short-lived; transitory:
the ephemeral joys of childhood.
2. lasting but one day: an ephemeral flower. –noun
3. anything short-lived, as certain insects.
Wednesday, 8 June 2011
7. Ephemeral Storage
Seek Performance
8000
7000
6000
7000 IOPs from a disk??
5000
m1.large, ephemeral
Seek / s
4000 m1.xlarge, ephemeral
c1.xlarge, ephemeral
3000
2000
1000
0
1 2 3 4
# Devices http://www.slideshare.net/davegardnerisme/
running-cassandra-on-amazon-ec2
Wednesday, 8 June 2011
11. • Max 4 devices per instance
• Data goes away when instance is
terminated (or crashes!)
• Suspect there is some sort indirection layer
underneath - thin provisioning / dedupe /
CoW or something
• Linux software RAID sucks
Wednesday, 8 June 2011
12. R ES ...
F AI LU
E LA T ED
CO RR
What happens if a bug in your software
causes all your nodes to crash?
ie say a memory leak causes an
OOM... on all nodes
Wednesday, 8 June 2011
19. • Limited to ~100 IOPS per device?
• Or just 10ms latency?
• Seems to scale pretty linearly for random IO
• Sequential IO limited by network bandwidth,
independent of # devices
• shared with other network traffic?
• Linux software RAID sucks
Wednesday, 8 June 2011
20. R ES ...
F AI LU
E LA T ED
CO RR
What happens when EBS breaks?
http://storagemojo.com/2011/04/29/amazons-ebs-outage/
http://status.heroku.com/incident/151
Wednesday, 8 June 2011
22. “Use Elastic Block Storage”
http://stackoverflow.com/questions/4714879/deploy-cassandra-on-ec2
“Raid 0 EBS drives are the way to go”
http://coreyhulen.org/2010/10/03/%EF%BB%BFcassandra-performance-tests-on-ec2/
“we recommend using raid0 ephemeral disks”
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cold-boot-
performance-problems-td5615829.html#a5615889
Wednesday, 8 June 2011
25. Insert Rates by Instance Type
35000
30000
25000
20000
Inserts / s
15000
10000
5000
0
e ral e ral e ral ebs ebs ebs
hem hem hem ar ge, ar ge, ar ge,
ep ep , ep 1.l 1.x
l
1.x
l
ge, ge, rge m m c
1 .lar .x lar . xl a
m m1 c1
100 threads, batch mutate size 100, values length 10, 1 column per row, 300 million values
Wednesday, 8 June 2011