Speaker(s): Kathryn Erickson, Engineering at DataStax
During this session we will discuss varying recommended hardware configurations for DSE. We’ll get right to the point and provide quick and solid recommendations up front. After we get the main points down take a brief tour of the history of database storage and then focus on designing a storage subsystem that won't let you down.
CaSSanDra: An SSD Boosted Key-Value StoreTilmann Rabl
This presentation was held by Prashanth Menon at ICDE '14 on April 3, 2014 in Chicago, IL, USA.
The full paper and additional information is available at:
http://msrg.org/papers/Menon2013
Abstract:
With the ever growing size and complexity of enterprise systems there is a pressing need for more detailed application performance management. Due to the high data rates, traditional database technology cannot sustain the required performance. Alternatives are the more lightweight and, thus, more performant key-value stores. However, these systems tend to sacrifice read performance in order to obtain the desired write throughput by avoiding random disk access in favor of fast sequential accesses.
With the advent of SSDs, built upon the philosophy of no moving parts, the boundary between sequential vs. random access is now becoming blurred. This provides a unique opportunity to extend the storage memory hierarchy using SSDs in key-value stores. In this paper, we extensively evaluate the benefits of using SSDs in commercialized key-value stores. In particular, we
investigate the performance of hybrid SSD-HDD systems and demonstrate the benefits of our SSD caching and our novel dynamic schema model.
CaSSanDra: An SSD Boosted Key-Value StoreTilmann Rabl
This presentation was held by Prashanth Menon at ICDE '14 on April 3, 2014 in Chicago, IL, USA.
The full paper and additional information is available at:
http://msrg.org/papers/Menon2013
Abstract:
With the ever growing size and complexity of enterprise systems there is a pressing need for more detailed application performance management. Due to the high data rates, traditional database technology cannot sustain the required performance. Alternatives are the more lightweight and, thus, more performant key-value stores. However, these systems tend to sacrifice read performance in order to obtain the desired write throughput by avoiding random disk access in favor of fast sequential accesses.
With the advent of SSDs, built upon the philosophy of no moving parts, the boundary between sequential vs. random access is now becoming blurred. This provides a unique opportunity to extend the storage memory hierarchy using SSDs in key-value stores. In this paper, we extensively evaluate the benefits of using SSDs in commercialized key-value stores. In particular, we
investigate the performance of hybrid SSD-HDD systems and demonstrate the benefits of our SSD caching and our novel dynamic schema model.
This presentation provides a basic overview of Ceph, upon which SUSE Storage is based. It discusses the various factors and trade-offs that affect the performance and other functional and non-functional properties of a software-defined storage (SDS) environment.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
Some workloads require very low latency for high percentile of requests that even the fastest of disks may feel challenged to provide. This requirement will not be met if several IO reads will have to be issued to retrieve requested data from the storage array. The new Scylla In-Memory storage option was added to Scylla Enterprise to satisfy the read mostly workloads that fit into the memory and require consistent low latency. In this talk we will discuss the characteristics of the implementation and how can you take advantage of it.
Practical advices how to achieve persistence in Redis. Detailed overview of all cons and pros of RDB snapshots and AOF logging. Tips and tricks for proper persistence configuration with Redis pools and master/slave replication.
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
- Что такое SDS (общие места для (почти) всех решений — масштабирование, абстрагирование от аппаратных ресурсов, управление с помощью политик, кластерные ФС);
- Почему мы решили использовать SDS (нужно было объектное хранилище);
- Почему решили использовать именно Ceph, а не другие открытые (GlusterFS, Swift...) или проприетарные (IBM Elastic Storage, Huawei OceanStor) решения;
- Что еще умеет Ceph, кроме object storage (RBD, CephFS);
- Как работает Ceph (со стороны сервера);
- Что нового дает BlueStore по сравнению с классическим (поверх ФС);
- Сравнение производительности (метрики тестов);
- BlueStore — все еще tech preview;
- Заключение. Ссылки, литература.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
This presentation provides a basic overview of Ceph, upon which SUSE Storage is based. It discusses the various factors and trade-offs that affect the performance and other functional and non-functional properties of a software-defined storage (SDS) environment.
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
Some workloads require very low latency for high percentile of requests that even the fastest of disks may feel challenged to provide. This requirement will not be met if several IO reads will have to be issued to retrieve requested data from the storage array. The new Scylla In-Memory storage option was added to Scylla Enterprise to satisfy the read mostly workloads that fit into the memory and require consistent low latency. In this talk we will discuss the characteristics of the implementation and how can you take advantage of it.
Practical advices how to achieve persistence in Redis. Detailed overview of all cons and pros of RDB snapshots and AOF logging. Tips and tricks for proper persistence configuration with Redis pools and master/slave replication.
Ceph BlueStore - новый тип хранилища в Ceph / Максим Воронцов, (Redsys)Ontico
- Что такое SDS (общие места для (почти) всех решений — масштабирование, абстрагирование от аппаратных ресурсов, управление с помощью политик, кластерные ФС);
- Почему мы решили использовать SDS (нужно было объектное хранилище);
- Почему решили использовать именно Ceph, а не другие открытые (GlusterFS, Swift...) или проприетарные (IBM Elastic Storage, Huawei OceanStor) решения;
- Что еще умеет Ceph, кроме object storage (RBD, CephFS);
- Как работает Ceph (со стороны сервера);
- Что нового дает BlueStore по сравнению с классическим (поверх ФС);
- Сравнение производительности (метрики тестов);
- BlueStore — все еще tech preview;
- Заключение. Ссылки, литература.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
Best Practices & Performance Tuning - OpenStack Cloud Storage with Ceph - In this presentation, we discuss best practices and performance tuning for OpenStack cloud storage with Ceph to achieve high availability, durability, reliability and scalability at any point of time. Also discuss best practices for failure domain, recovery, rebalancing, backfilling, scrubbing, deep-scrubbing and operations
Windows Server 2012 R2 Software-Defined StorageAidan Finn
In this presentation I taught attendees how to build a Scale-Out File Server (SOFS) using Windows Server 2012 R2, JBODs, Storage Spaces, Failover Clustering, and SMB 3.0 Networking, suitable for storing application data such as Hyper-V and SQL Server.
Mike Pittaro - High Performance Hardware for Data Analysis PyData
Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more complicated when you need a full cluster for big data analytics.
This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice oriented session, and will focus on performance and cost tradeoffs for many different options.
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsMarco Obinu
Come dimensionare una VM per SQL Server in Azure IaaS, alla luce delle ultime novità della piattaforma.Sessione erogata il 24 Aprile 2020, nell'ambito del Global Azure Virtual 2020.
Video sessione: https://youtu.be/7o80CJUtnh4
Demo: https://github.com/OmegaMadLab/SqlIaasVmPlayground
ARM Template ottimizzato per SQL Server: https://github.com/OmegaMadLab/OptimizedSqlVm-v2
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
18.
Workload
CPU
RAM
STORAGE
C*/DSE
–
Read
Heavy
Capacity
Driven
4-‐24
cores
32-‐256GB
(128
really
but
cache
is
king)
Local
SSD
–
.5
-‐
2.5TB
C*/DSE
–
Write
Heavy
–
TransacQon
Driven
8-‐32
cores
32-‐128GB
Local
SSD
–
1-‐
3TB
/
spinning
disk
if
you
must
–
.5
–
1TB
DSE
+
SOLR
12-‐32
cores
128GB
Separate
local
SSD
for
Lucene
and
C*
-‐
1
–
3TB
DSE
+
SPARK
12-‐32
cores
128GB
Separate
local
SSD
for
Lucene
and
C*
-‐
1
–
3TB
*1GbE
is
sufficient
but
many
are
choosing
10GbE
for
future-‐proofing
28. Computers
sQll
have
moving
parts?
Isn’t
there
something
beger?
Maybe
a
sequenQal
write
workload
wouldn’t
be
soooo
bad…
Maybe
I
can
just
get
a
bunch
of
them?
Did
Moore’s
Law
let
us
down?
This
is
a
ridiculous
conversaQon.
Flash
is
cheap
now!
29. • No
Moving
Parts
(Beger
parallelism
thus
beger
BW/IOPS)
• No
Seek
Times
(Low
Latency)
• Lower
Power
• Less
Heat
Produced
(reduced
cooling
cost)
FLASH
STORAGE
SSDs
10K
–
1M
IOPS
400MB-‐3GB
BW
<200us
Latency
30. • DWPD
–
Overwrites
per
day
over
5
years
• PBW
–
Total
Petabytes
you
can
write
to
the
drive
before
wear
out
MLC
>=1
DWPD
31. Storage
Interfaces
Interface
Speed
SATA
III
.75
Gigabyte
SAS
II
.75
Gigabyte
SAS
III
1.5
Gigabytes
PCIe
Gen
2
x8
4
Gigabytes
32. HDD
STORAGE
–
Average
SATA
Latency
15K
RPM
SAS
7200
RPM
SATA
7200
RPM
SAS
37. Conclusions
• CPU
-‐
Get
the
most
Cache
for
your
Cash
• RAM
–
4
DIMMs/CPU
32GB-‐256GB
Total
• Storage
–
MLC
SSD
–
PCIe
is
Gold
Standard
Acknowledgements
• Al
Tobey,
Mag
Kennedy,
Jon
Erickson