Red Hat Ceph Storage: Past, Present and FutureRed_Hat_Storage
Ceph is a massively scalable, open source, software-defined storage system that runs on commodity hardware. Get an update about the latest version of Red Hat Ceph Storage, including information about the newest features and use cases, with a particular focus on cloud storage and OpenStack. We’ll also explore the themes and directions for the roadmap for the next 12 months.
Red Hat's Ross Turk took the podium at the Public Sector Red Hat Storage Days on 1/20/16 and 1/21/16 to explain just why software-defined storage matters.
Red Hat Ceph Storage: Past, Present and FutureRed_Hat_Storage
Ceph is a massively scalable, open source, software-defined storage system that runs on commodity hardware. Get an update about the latest version of Red Hat Ceph Storage, including information about the newest features and use cases, with a particular focus on cloud storage and OpenStack. We’ll also explore the themes and directions for the roadmap for the next 12 months.
Red Hat's Ross Turk took the podium at the Public Sector Red Hat Storage Days on 1/20/16 and 1/21/16 to explain just why software-defined storage matters.
Sanger OpenStack presentation March 2017Dave Holland
A description of the Sanger Institute's journey with OpenStack to date, covering RHOSP, Ceph, S3, user applications, and future plans. Given at the Sanger Institute's OpenStack Day.
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
Speaker: Marc Fielding, Co-speaker: Maris Elsins.
Oracle Database Appliance provides a robust, highly-available, cost-effective, and surprisingly scalable platform for database as a service environment. By leveraging Oracle Enterprise Manager's self-service features, databases can be provisioned on a self-service basis to a cluster of Oracle Database Appliance machines. Discover how multiple ODA devices can be managed together to provide both high availability and incremental, cost-effective scalability. Hear real-world lessons learned from successful database consolidation implementations.
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...Tesora
In this session, speakers Amrith Kumar (Tesora), Steven Walchek (SolidFire), and Chris Merz (SolidFire) discuss Cinder, the OpenStack block storage service, and OpenStack Trove.
Leveraging OpenStack Cinder for Peak Application PerformanceNetApp
Deploying performance sensitive, database-driven applications in OpenStack can be tenuous if you are unsure how to utilize the Cinder API to get the most out of your OpenStack block storage.
This presentation:
Introduces Cinder, the OpenStack block storage service
Talks about the unique attributes of performance-sensitive applications and what this means in OpenStack
Walks you through how to use Cinder volume types and extra specs to guarantee performance to your various cloud workloads
Discusses OpenStack Trove and what it means for running database as a service in your OpenStack cloud
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
From SUSECon 2015: Smooth integration of emerging Software Defined Storage technologies into traditional Data Center using Fiber Channel and iSCSI as key values for success.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
Comparing the burst buffers of today, such as the Cray DataWarp-based burst buffer implemented on NERSC Cori, to the proto-burst buffer deployed on SDSC's Gordon supercomputer in 2012.
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
When low latency (P99) and high performance are core requirements, what NoSQL database attributes should you consider, and what tradeoffs are key? While we live in a world of multi-CPU, multi-core servers capable of storing tens of terabytes of data, if your database isn’t architected to take advantage of this, you’re being penalized on performance or cost.
Join this webinar to learn about the critical elements for a high-performance, low-latency NoSQL database. ScyllaDB’s engineers will discuss how they addressed core database performance challenges, including the pros and cons of each, and provide a detailed explanation of the architectural principles they applied to achieve their performance objectives.
We’ll take a deep dive into the strategies applied to:
Achieve precise control over I/O and compute-intensive workloads
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Squeeze the most out of modern multi-core hardware
Satisfy SLAs while maintaining system stability
Most mid-sized Django websites thrive by relying on memcached. Though what happens when basic memcached is not enough? And how can one identify when the caching architecture is becoming a bottleneck? We'll cover the problems we've encountered and solutions we've put in place.
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...Amazon Web Services
Minimizing customer impact is a key feature in successfully rolling out frequent code updates. Learn how to leverage the AWS cloud so you can minimize bug impacts, test your services in isolation with canary data, and easily roll back changes. Learn to love deployments, not fear them, with a blue/green architecture model. This talk walks you through the reasons it works for us and how we set up our AWS infrastructure, including package repositories, Elastic Load Balancing load balancers, Auto Scaling groups, internal tools, and more to help orchestrate the process. Learn to view thousands of servers as resources at your command to help improve your engineering environment, take bigger risks, and not spend weekends firefighting bad deployments.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
4. Agenda
4
@
First Ceph Environment at Target went live in October of 2014
• “Firefly” Release
Ceph was backing Target’s first ‘official’ Openstack release
• Icehouse Based
• Ceph is used for:
• RBD for Openstack Instances and Volumes
• RADOSGW for Object (instead of Swift)
• RBD backing Celiometer MongoDB volumes
Replaced traditional array-based approach that was implemented in our
prototype Havana environment.
• Traditional storage model was problematic to integrate
• General desire at Target to move towards open solutions
• Ceph’s tight integration with Openstack a huge selling point
5. Agenda
5
@
Initial Ceph Deployment:
• Dev + 2 Prod Regions
• 3 x Monitor Nodes – Cisco B200
• 12 x OSD Nodes – Cisco C240 LFF
• 12 4TB SATA Disks
• 10 OSD per server
• Journal partition co-located on each OSD disk
• 120 OSD Total = ~ 400 TB
• 2x 10GBE per host
• 1 public_network
• 1 cluster_network
• Basic LSI ‘MegaRaid’ controller – SAS 2008M-8i
• No supercap or cache capability onboard
• 10xRAID0
Initial Rollout
6. Post rollout it became evident that there were performance issues within
the environment.
• KitchenCI users would complain of slow Chef converge times
• Yum transactions / app deployments would take abnormal amounts of time to
complete.
• Instance boot times, especially cloud-init images would take excessively long time to
boot, sometimes timing out.
• General user griping about ‘slowness’
• Unacceptable levels of latency even while cluster was relatively unworked
• High levels of CPU IOWait%
• Poor IOPS / Latency - FIO benchmarks running INSIDE Openstack Instances
$ fio --rw=write --ioengine=libaio --runtime=100 --direct=1 --bs=4k --size=10G --iodepth=32 --name=/tmp/testfile.bin
test: (groupid=0, jobs=1): err= 0: pid=1914
read : io=1542.5MB, bw=452383 B/s, iops=110 , runt=3575104msec
write: io=527036KB, bw=150956 B/s, iops=36 , runt=3575104msec
Initial Rollout
7. Compounding the performance issues we began to see mysterious
reliability issues.
• OSDs would randomly fall offline
• Cluster would enter a HEALTH_ERR state about once a week with ‘unfound objects’
and/or Inconsistent page groups that required manual intervention to fix.
• These problems were usually coupled with a large drop in our already suspect
performance levels
• Cluster would enter a recovery state often bringing client performance to a standstill
Initial Rollout
8. Customer opinion of our Openstack deployment due to
Ceph…..
…which leads to perception of the team....
10. What could we have done differently??
• Hardware
• Monitoring
• Tuning
• User Feedback
11. Ceph is not magic. It does the best
with the hardware you give it!
Much ill-advised advice floating around that if you just throw enough crappy disks at Ceph
you will achieve enterprise grade performance. Garbage in – Garbage out. Don’t be greedy
and build for capacity, if your objective is to create more a performant block storage
solution.
Fewer Better Disks > More ‘cheap’ Disks
….depending on your use case.
Hardware
12. • Root cause of HEALTH_ERRs was “unnamed vendor’s” SATA drives in our
solution ‘soft-failing’ – slowly gaining media errors without reporting themselves
as failed. Don’t rely on SMART. Interrogate your disks with a array-level tool, like
MegaRAID to identify drives for proactive replacement.
$ opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep Media
• In installations with co-located journal partitions, a RAID solution with
cache+BBU for writeback operation would have been a huge performance gain.
Paying more attention to the suitability of hardware our vendor of choice provided
would have saved a lot of headaches
Hardware
13. Monitor Your Implementation From
The Outset!
Ceph provides a wealth of data about the cluster state! Unfortunately, only the most
rudimentary data is exposed by the ‘regular’ documented commands.
Quick Example - Demo Time!
Calamari?? Maybe a decent start … but we quickly outgrew it.
Challenge your monitoring team. If your solution isnt
working for you – go SaaS. Chatops FTW. Develop
those ‘Special Sets Of Skills’!
In monitoring –
ease of collection >depth of feature set
Monitors
14. require 'rubygems'
require 'dogapi'
api_key = “XXXXXXXXXXXXXXXXXXXXXXXX"
dosd = `ceph osd tree | grep down | wc -l`
host = `hostname`
if host.include?("ttb")
envname = "dev"
elsif host.include?("ttc")
envname = "prod-ttc"
else
envname = "prod-tte”
end
dog = Dogapi::Client.new(api_key)
dog.emit_point("ceph.osd_down", dosd, :tags => ["env:#{envname}","app:ceph"])
Monitors
16. Tune to your workload!
• This is unique to your specific workloads
But… in general.....
Neuter the default recovery priority
[osd]
osd_max_backfills = 1
osd_recovery_priority = 1
osd_client_op_priority = 63
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
Limit the impact of deep scrubing
osd_scrub_max_interval = 1209600
osd_scrub_min_interval = 604800
osd_scrub_sleep = .05
osd_snap_trim_sleep = .05
osd_scrub_chunk_max = 5
osd_scrub_chunk_min = 1
osd_deep_scrub_stride = 1048576
osd_deep_scrub_interval = 2592000
Tuning
17. Get Closer to Your Users!
Don’t Wall Them Off With Process!
• Chatops!
• Ditch tools like Lync / Sametime.
• ’1 to 1’ Enterprise Chat Apps are dead men walking.
• Consider Slack / Hipchat
• Foster an Enterprise community around your tech with available
tools
• REST API integrations allow far more robust notifications of
issues in a ‘stream of consciousness’ fashion.
Quick Feedback
19. Agenda
1
9
Improved Hardware
• OSD Nodes – Cisco C240M4 SFF
• 20 10k Seagate SAS 1.1TB
• 6 480g Intel S3500 SSD
• We have tried the ’high-durability’ Toshiba SSD
they seem to work pretty well.
• Journal partition on SSD with 5:1 OSD/Journal ratio
• 90 OSD Total = ~ 100 TB
• Improved LSI ‘MegaRaid’ controller – SAS-9271-8i
• Supercap
• Writeback capability
• 18xRAID0
• Writethru on journals, writeback on spinning OSDs.
• Based on “Hammer” Ceph Release
After understanding that slower, high capacity disk wouldn't meet our
needs for an Openstack general purpose block storage solution – we
rebuilt.
Current State
20. • Obtaining metrics from our design change was nearly immediate due to
having effective monitors in place
– Latency improvements have been extreme
– IOWait% within Openstack instances have been greatly reduced
– Raw IOPS throughput has sykrocketed
– Throughput testing with RADOS bench and FIO shows aprox. 10 fold increase
– User feedback has been extremely positive, general Openstack experience at
Target is much improved. Feedback enhanced by group chat tools.
– Performance within Openstack instances has increase about 10x
Results
test: (groupid=0, jobs=1): err= 0: pid=1914
read : io=1542.5MB, bw=452383 B/s,
iops=110 , runt=3575104msec
write: io=527036KB, bw=150956 B/s,
iops=36 , runt=3575104msec
test: (groupid=0, jobs=1): err= 0: pid=2131
read : io=2046.6MB, bw=11649KB/s,
iops=2912 , runt=179853msec
write: io=2049.1MB, bw=11671KB/s,
iops=2917 , runt=179853msec
Current State
21.
22. • Forcing the physical world to bend to our will. Getting datacenter techs to understand the
importance of rack placements in modern ‘scale-out’ IT
– To date our server placement is ‘what's the next open slot?’
– Create a ‘rack’ unit of cloud expansion
– More effectively utilize CRUSH for data placement and
availability
• Normalizing our Private Cloud storage performance offerings
– ENABLE IOPS LIMITS IN NOVA! QEMU supports this natively. Avoid the all you can
eat IO buffet.
nova-manage flavor set_key --name m1.small --key quota:disk_read_iops_sec --value 300
nova-manage flavor set_key --name m1.small --key quota:disk_write_iops_sec --value 300
– Leverage Cinder as the storage ‘menu’ beyond the default offering.
• Experiment with NVME for journal disks – greater journal density.
• Currently testing all SSD pool performance
– All SSD in Ceph has been maturing rapidly – Jewel sounds very promising.
– We have needs for a ‘ultra’ Cinder tier for workloads that require high IOPS / low
latency for use cases such as Apache Cassandra
– Considering Solidfire for this use case
Next Steps
Next Steps
23. • Repurposing legacy SATA hardware into a dedicated object pool
– High capacity, low performance drives should work well in an object use case
– Jewel has per-tenant namespaces for RADOSGW (!)
• Automate deployment with Chef to bring parity with our Openstack automation. ceph-deploy
still seems to be working for us.
– Use TDD to enforce base server configurations
• Broadening Ceph beyond cloud niche use case. Especially with improved object offering.
Next Steps
Next Steps
24. • Before embarking on creating a Ceph environment, have a good idea of what
your objectives are for the environment.
– Capacity?
– Performance?
• If you make wrong decisions it can lead to a negative user perception of Ceph,
and the technologies that depend on it, like Openstack
• Once you understand your objective, understand that your hardware selection is
crucial to your success
• Unless you are architecting for raw capacity, use SSDs for your journal volumes
without exception
– If you must co-locate journals, use a RAID adapter with BBU+Writeback cache
• A hybrid approach may be feasible with SATA ‘capacity’ disks with SSD or
NVME journals. I’ve yet to try this, I’d be interested in seeing some benchmark
data on a setup like this
• Research, experiment, break stuff, consult with Red Hat / Inktank
• Monitor, monitor, monitor and provide a very short feedback loop for your users
to engage you with their concerns
Conclusion
Conclusion