Using ScyllaDB for Distribution of Game Assets in Unreal Engine

•

0 likes•2,239 views

How Epic Games is using ScyllaDB for distribution of large game assets used by Unreal Engine across the world —enabling game developers to more quickly build great games.

Using ScyllaDB for
Distribution of Game
Assets in Unreal Engine
Joakim Lindqvist, Senior Tools Programmer

Joakim Lindqvist
■ Senior Tools Developer within the Foundation team of
Unreal Engine
■ 15 years experience working in games
■ Based out of Stockholm, Sweden

■ The complexity of building games
■ Overview of Unreal Cloud DDC
■ Future work
Agenda

Games are Complicated to Make
■ 2 major types of content, source code and game assets
■ Large assets
■ Large teams work on them
■ Specialized skill - spread out across the world

The Process of Cooking Data
■ Cooking is data transformation
■ Historically cached on network ﬁle systems within a oﬃce
■ Geo distribution
■ Work From Home
■ Headache to scale storage
■ Can take hours to start up the editor without caching

Cached Objects
■ Custom self describing binary
format called Compact Binary
■ Structured object with
attachments that are content
addressed
■ Content addressing
■ Deduplication
■ Avoids replicating the same
content
{
“name”: “largeFile”,
“size” : 2480,
“attachment”: 9fffabc5e0a…1f084f8c5e
}

■ Kubernetes
■ 5 regions
■ 7 TB local nvme cache
■ ~50 TB S3 storage
■ 10-500 users per region
■ Last access tracking
Unreal Cloud DDC

Architecture
ScyllaDB
(Metadata)
Web api
Local NVME
AWS S3
(Blob
Storage)
ScyllaDB
(Metadata) Web api
AWS S3
(Blob
Storage)

Why ScyllaDB
■ Performance sensitive workload
■ Geo replicated database
■ Eventual consistency works well with our workload
■ Small deployment (2-3 nodes of i4i.xlarge/i4i.2xlarge) in 5 DCs
■ ~3000 req/s per node

Future Work
■ Small blob storage in ScyllaDB
■ ScyllaDB on Azure
■ Performance improvements on the API
■ Extending to more workloads

■ Source Code: https://github.com/EpicGames
Unreal Engine

Thank You
Stay in Touch
Joakim Lindqvist
joakim.lindqvist@epicgames.com
twitter.com/J0CC0
github.com/jocco
www.linkedin.com/in/jocco/

Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar, we will discuss the advances to modern server technology and take a deep dive into ScyllaDB’s shard-per-core architecture and our asynchronous engine, the Seastar framework. Join us to learn how Seastar (and ScyllaDB): - Avoid locks and contention on the CPU level - Bypass kernel bottlenecks - Implement its per-core shared-nothing autosharding mechanism - Utilize modern storage hardware - Leverage NUMA to get the best RAM performance - Balance your data across CPUs and nodes for the best and smoothest performance Plus we’ll cover the advantages of unlocking vertical scalability.

ClickHouse Monitoring 101: What to monitor and how

Altinity Ltd

Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020 You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.

Improving Presto performance with Alluxio at TikTok

Alluxio, Inc.

Crimson: Ceph for the Age of NVMe and Persistent Memory

ScyllaDB

Ceph is a mature open source software-defined storage solution that was created over a decade ago. During that time new faster storage technologies have emerged including NVMe and Persistent memory. The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.

Using eBPF for High-Performance Networking in Cilium

ScyllaDB

Under the Hood of a Shard-per-Core Database Architecture

ScyllaDB

Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework. Join us to learn how Seastar (and Scylla): Avoid locks and contention on the CPU level Bypass kernel bottlenecks Implement its per-core shared-nothing autosharding mechanism Utilize modern storage hardware Leverage NUMA to get the best RAM performance Balance your data across CPUs and nodes for best and smoothest performance Plus we’ll cover the advantages of unlocking vertical scalability.

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

ScyllaDB

I'm going to discuss the efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to Linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally, we’ll discuss library and application-level tunings, which are mostly applicable to HTTP servers in general and nginx/envoy specifically. For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads. Also, I'll cover more theoretical approaches to performance analysis and the newly developed tooling like `bpftrace` and new `perf` features.

Introduction to Kafka Cruise Control

Jiangjie Qin

Proxysql sharding

Marco Tusa

Advance Sharding Solution with ProxySQL ProxySQL is a very powerful platform that allows us to manipulate and manage our connections and queries in a simple but effective way. Historically MySQL lacks in sharding capability. This significant missing part had often cause developer do implement sharding at application level, or DBA/SA to move on to another solution. ProxySQL comes with an elegant and simple solution that allow us to implement sharding capability with MySQL without the need to perform significant, or at all, changes in the code. This brief presentation will illustrate how to successfully configure and use ProxySQL to perform sharding, from very simple approach based on connection user/ip/port, to complicate ones that see the need to read values inside queries.

VictoriaLogs: Open Source Log Management System - Preview

VictoriaMetrics

VictoriaLogs Preview - Aliaksandr Valialkin * Existing open source log management systems - ELK (ElasticSearch) stack: Pros & Cons - Grafana Loki: Pros & Cons * What is VictoriaLogs - Open source log management system from VictoriaMetrics - Easy to setup and operate - Scales vertically and horizontally - Optimized for low resource usage (CPU, RAM, disk space) - Accepts data from Logstash and Fluentbit in Elasticsearch format - Accepts data from Promtail in Loki format - Supports stream concept from Loki - Provides easy to use yet powerful query language - LogsQL * LogsQL Examples - Search by time - Full-text search - Combining search queries - Searching arbitrary labels * Log Streams - What is a log stream? - LogsQL examples: querying log streams - Stream labels vs log labels * LogsQL: stats over access logs * VictoriaLogs: CLI Integration * VictoriaLogs Recap

VictoriaMetrics: Welcome to the Virtual Meet Up March 2023

VictoriaMetrics

Overview of the Latest Features - Roman Khavronenko & Aliaksandr Valialkin VictoriaMetrics 2023 Q1 Stats - 180+ issues - 260+ PRs - 40 contributors - 12 releases, from 1.86 to 1.89: - 114 FEATURES - 103 BUG FIXES What's new in VictoriaMetrics at Q1 2023 - Roman Khavronenko & Aliaksandr Valialkin * vmui Features - Dark theme - Explore mode - Sticky tooltips - Cardinality explorer * vmalert: GCS and S3 support for config rules * Streaming Aggregation - Use cases & benefits - Usage & examples * VictoriaMetrics Remote Write Protocol * vmauth improvements Roadmap Review/Update (upcoming) - Roman Khavronenko * Grafana Data Source Plugin - Grafana datasource plugin: query trace - OpenTelemetry ingestion protocol support - vmalert: UI for rules management - vmalert: hysteresis support - vmui explore tab

Scaling Apache Pulsar to 10 Petabytes/Day

ScyllaDB

Pulsar is used by a portfolio of products at Splunk for stream processing of different types of data, including metrics and logs. In this talk, Karthik Ramasamy will share how Splunk helped a flagship customer scale a Pulsar deployment to handle 10 PB/day in a single cluster. He will talk about the journey, the challenges faced, and the trade-offs made to scale Pulsar and operate it reliably and stably in Google Cloud Platform (GCP).

Tuning Apache Kafka Connectors for Flink.pptx

Flink Forward

Flink Forward San Francisco 2022. In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you! by Olena Babenko

検証環境をGoBGPで極力仮想化してみた

Toshiya Mabuchi

NetOpsCoding#2のLTで発表した資料です． https://atnd.org/events/74772 Abstract: ネットワークの業務をしていく上で，「検証」についての悩みは尽きません．検証したくても検証機は沢山用意できるはずもなく，少ない検証機は多人数が触るでハチャメチャな中身になってしまい，ストレスMAXです．そんな悩みを""極力""解消すべく，検証用環境のネットワークを実機とgobgpやその付属APIを使って構築してみたお話をします．

Maria db 이중화구성_고민하기

NeoClova

libSQL

ScyllaDB

SQLite is a widely used embedded database engine, known for its simplicity and lightweight design. However, the original SQLite project does not accept contributions from third parties and does not use third-party code, which can limit its potential for innovation. This talk is an overview of SQLite architecture and an introduction to libSQL: Chiselstrike's fork of SQLite. Piotr Sarna will show how this fork can be used in distributed settings, with automatic backups and the ability to replicate data across multiple nodes. Chiselstrike's modifications also include integration with WebAssembly, which allows users to define custom functions and procedures using Wasm, a compact and portable binary format. You'll learn the reasons behind this fork of SQLite, and the challenges and trade-offs involved in extending the database with these new features. Piotr also presents Chiselstrike's plans for future work. This talk will be relevant to database researchers and practitioners interested in leveraging SQLite for applications that require custom functions and/or distributed support.

How to Survive an OpenStack Cloud Meltdown with Ceph

Sean Cohen

What if you lost your datacenter completely in a catastrophe, but your users hardly noticed? Sounds like a mirage, but it’s absolutely possible. This talk will showcase OpenStack features enabling multisite and disaster recovery functionalities. We’ll present the latest capabilities of OpenStack and Ceph for Volume and Image Replication using Ceph Block and Object as the backend storage solution, as well as look at the future developments they are driving to improve and simplify the relevant architecture use cases, such as Distributed NFV, an emerging use case that rationalizes your IT by using less control planes and allows you to spread your VNF on multiple datacenters and edge deployments. In this session you will learn about wew OpenStack features enabling Multisite and distributed deployments, as well as review key use cases, architecture design and best practices to help operations avoid the OpenStack cloud Meltdown nightmare. https://youtu.be/n2S7uNC_KMw https://goo.gl/cRNGBK

Square Engineering's "Fail Fast, Retry Soon" Performance Optimization Technique

ScyllaDB

Optimizing for performance and reducing latency is a hard problem. Examples could be: choosing a different algorithm and data structures, improving SQL queries, adding a cache, serving requests asynchronously, or some low-level optimization that requires a deep understanding of the OS, kernel, compiler, or the network stack. The engineering effort is usually nontrivial, and only if you're lucky, you'll see some tangible results. That being said, there are some performance optimization techniques, with a few lines of code — even exist in the built-in library — it can lead to noticeable surprising results. One of these techniques is to "fail fast, retry soon". These techniques are often neglected or taken for granted. In distributed systems, a service or a database consists of a fleet of nodes that functions as one unit. It is not uncommon for some nodes to go down, usually, for a short time. When this occurs, failures can happen on the client-side and can lead to an outage. To build resilient systems, and reduce the probability of failure, we're going to explore these topics: timeouts, backoff, and jitter. We'll talk about timeouts, what timeout to set, pitfalls of retries, how backoff improves resource utilization, and jitters reduce congestion. Furthermore, we're going to see an adaptive mechanism to dynamically adjust these configurations. This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms after employing these three techniques: timeouts, backoff, and jitter. This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques: - Timeouts, retries and backoff with jitter in the AWS Builder's Library, 2019 (https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/) - Exponential Backoff and Jitter on the AWS Architecture Blog, 2016 (https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) - Fixing retries with token buckets and circuit breakers, Marc's Blog, 2022 (https://brooker.co.za/blog/2022/02/28/retries.html)

The Parquet Format and Performance Optimization Opportunities

Databricks

The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads. As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general. This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.

How to Build a Scylla Database Cluster that Fits Your Needs

ScyllaDB

Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources. Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other. In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.

Linux Kernel vs DPDK: HTTP Performance Showdown

ScyllaDB

In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass). It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints. As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.

Seastore: Next Generation Backing Store for Ceph

ScyllaDB

Ceph is an open source distributed file system addressing file, block, and object storage use cases. Next generation storage devices require a change in strategy, so the community has been developing crimson-osd, an eventual replacement for ceph-osd intended to minimize cpu overhead and improve throughput and latency. Seastore is a new backing store for crimson-osd targeted at emerging storage technologies including persistent memory and ZNS devices.

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...

NoSQLmatters

Salvatore Sanfilippo – How Redis Cluster works, and why In this talk the algorithmic details of Redis Cluster will be exposed in order to show what were the design tensions in the clustered version of an high performance database supporting complex data type, the selected tradeoffs, and their effect on the availability and consistency of the resulting solution.Other non-chosen solutions in the design space will be illustrated for completeness.

Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021

Valeriy Kravchuk

Bpftrace is a relatively new eBPF-based open source tracer for modern Linux versions (kernels 5.x.y) that is useful for analyzing production performance problems and troubleshooting software. Basic usage of the tool, as well as bpftrace one liners and advanced scripts useful for MariaDB DBAs are presented. Problems of MariaDB Server dynamic tracing with bpftrace and some possible solutions and alternative tracing tools are discussed.

Logs/Metrics Gathering With OpenShift EFK Stack

Josef Karásek

USENIX ATC 2017: Visualizing Performance with Flame Graphs

Brendan Gregg

Talk by Brendan Gregg for USENIX ATC 2017. "Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg. This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."

Ansiblefest 2018 Network automation journey at roblox

Damien Garros

In December 2017, Roblox’s network was managed in a traditional way without automation. To sustained its growth, the team had to deploy 2 datacenters, a global network and multiple point of presence around the world in few months, the only solution to be able to achieve that was to automate everything. 6 months later, the team has made tremendous progress and many aspects of the network lifecycle has been automated from the routers, switches to the load balancers. Synopsis This talk is a retrospective of Roblox’s journey into Network automation: How we got started and how we automated an existing network. How we organized the project around Github and an DCIM/IPAM solution (netbox), How Docker helped us to package Ansible and create a consistent environment. How we managed many roles and variations of our design in single project How we have automated the provisioning of our F5 Load Balancers. For each point, we’ll cover what was successful, what was more challenging and what limitations we had to deal with.

Boosting Sitecore Development With Sitecore Docker

Peter Nazarov

What's hot

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

ScyllaDB

Introduction to Kafka Cruise Control

Jiangjie Qin

Proxysql sharding

Marco Tusa

VictoriaLogs: Open Source Log Management System - Preview

VictoriaMetrics

VictoriaMetrics: Welcome to the Virtual Meet Up March 2023

VictoriaMetrics

Scaling Apache Pulsar to 10 Petabytes/Day

ScyllaDB

Tuning Apache Kafka Connectors for Flink.pptx

検証環境をGoBGPで極力仮想化してみた

Maria db 이중화구성_고민하기

libSQL

How to Survive an OpenStack Cloud Meltdown with Ceph

Sean Cohen

Square Engineering's "Fail Fast, Retry Soon" Performance Optimization Technique

ScyllaDB

The Parquet Format and Performance Optimization Opportunities

Databricks

How to Build a Scylla Database Cluster that Fits Your Needs

ScyllaDB

Linux Kernel vs DPDK: HTTP Performance Showdown

ScyllaDB

Seastore: Next Generation Backing Store for Ceph

ScyllaDB

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...

NoSQLmatters

Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021

Valeriy Kravchuk

Logs/Metrics Gathering With OpenShift EFK Stack

Josef Karásek

USENIX ATC 2017: Visualizing Performance with Flame Graphs

Brendan Gregg

What's hot (20)

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

Introduction to Kafka Cruise Control

Proxysql sharding

VictoriaLogs: Open Source Log Management System - Preview

VictoriaMetrics: Welcome to the Virtual Meet Up March 2023

Scaling Apache Pulsar to 10 Petabytes/Day

Tuning Apache Kafka Connectors for Flink.pptx

検証環境をGoBGPで極力仮想化してみた

Maria db 이중화구성_고민하기

libSQL

How to Survive an OpenStack Cloud Meltdown with Ceph

Square Engineering's "Fail Fast, Retry Soon" Performance Optimization Technique

The Parquet Format and Performance Optimization Opportunities

How to Build a Scylla Database Cluster that Fits Your Needs

Linux Kernel vs DPDK: HTTP Performance Showdown

Seastore: Next Generation Backing Store for Ceph

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...

Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021

Logs/Metrics Gathering With OpenShift EFK Stack

USENIX ATC 2017: Visualizing Performance with Flame Graphs

Similar to Using ScyllaDB for Distribution of Game Assets in Unreal Engine

Ansiblefest 2018 Network automation journey at roblox

Damien Garros

Boosting Sitecore Development With Sitecore Docker

Peter Nazarov

XPDDS17: NoXS: Death to the XenStore - Filipe Manco, NEC

The Linux Foundation

The XenStore is a central piece of the Xen framework on which most operations depend. This makes it both a central point of failure and a performance bottleneck. In this presentation, Filipe will present NEC's work on removing the XenStore and replacing it with NoXS, a hypervisor-based mechanism that provides the necessary functionality to replace the XenStore when running paravirtualized guests. NoXS not only bring advantages in terms of guest management operations' times, but also in terms of scalability and reliability by simplifying dom0. NoXS is also backwards compatible by allowing the XenStore to run in parallel. This presentation will discuss the design of NoXS and show initial results of the prototype implementation, namely boot times bellow 20 ms for up to 8000 guests. The presenter would also like to discuss whether this work is interesting to the Xen project, and if so, how it could be upstreamed.

Learning to Mod Minecraft: A Father/Daughter Retrospective

Kevin Hakanson

Video: https://youtu.be/InbVSEA8V0U What do Minecraft and Blockly have in common? Minecraft is a popular, open world video game where players can build structures using digital blocks. Blockly is a open source visual programming language where students can build programs using blocks. LearnToMod combined these together to teach students how to modify Minecraft using either the Blockly visual editor or JavaScript. This session will be the retrospective of an enthusiastic father teaching his hesitant daughter (who loves Minecraft) about programming. We started with Hour of Code and pair-programmed through LearnToMod’s video lessons. What did we create? How did we like it? What would we recommend to others? Come learn about our experience and ask questions.

W3C HTML5 KIG-The complete guide to building html5 games

Changhwan Yi

DevOpsCon 2015 - DevOps in Mobile Games

Andreas Katzig

Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix

Node in Real Time - The Beginning

Axilis

[NetherRealm Studios] Game Studio Perforce Architecture

Perforce

Digital game preservation conference 12 25-2018

peterchanws

DWX 2018 - Automatisiertes Datenbankdeployment im DevOps Prozess

Marc Müller

Komplexe Anwendungen bedingen nicht nur Entwicklungstätigkeiten in der Software, sondern sind ebenfalls geprägt durch die Optimierung der Datenbank durch einen DBA. Oft wird die Verwaltung der Datenbank-Entwicklung im Rahmen der Build und Release-Automatisierung eher stiefmütterlich behandelt. Gerade bei vielen Releases DevOps Umfeld, ist es umso wichtiger, das Datenbank Deployment und die dazugehörigen Schema- und Datenmigrationen zu automatisieren. In dieser Session zeigen wir Ihnen den Einsatz von SQL Server Data Tools in Visual Studio sowie die Einbindung der Datenbankprojekte in das Build- und Release-System.

DWX 2018 - Automatisiertes Datenbank-Deployment im DevOps Prozess

Marc Müller

NixCon Berlin 2015 - Nix at LogicBlox

Rob Vermaas

Citrix Synergy 2014: Going the CloudPlatform Way

Iliyas Shirol

Devoxx : being productive with JHipster

Julien Dubois

Scalable Clusters On Demand

Bogdan Kyryliuk

At Opendoor, we do a lot of big data processing, and use Spark and Dask clusters for the computations. Our machine learning platform is written in Dask and we are actively moving data ingestion pipelines and geo computations to PySpark. The biggest challenge is that jobs vary in memory, cpu needs, and the load in not evenly distributed over time, which causes our workers and clusters to be over-provisioned. In addition to this, we need to enable data scientists and engineers run their code without having to upgrade the cluster for every request and deal with the dependency hell. To solve all of these problems, we introduce a lightweight integration across some popular tools like Kubernetes, Docker, Airflow and Spark. Using a combination of these tools, we are able to spin up on-demand Spark and Dask clusters for our computing jobs, bring down the cost using autoscaling and spot pricing, unify DAGs across many teams with different stacks on the single Airflow instance, and all of it at minimal cost.

Enabling Cloud Native Buildpacks for Windows Containers

VMware Tanzu

Data Management and Streaming Strategies in Drakensang Online

Andre Weissflog

Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...

Chris Shenton

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

SeungYong Oh

Session Video: https://youtu.be/7MPH1mknIxE In this talk, we share Devsisters' journey of migrating its internal data platform including Spark to Kubernetes, with its benefits and issues. 데브시스터즈에서 데이터플랫폼 컴포넌트를 쿠버네티스로 옮기면서 얻은 장점들과 이슈들에 대해 공유합니다. Conference session page: - English: https://sched.co/WIRK - Korean: https://sched.co/WYRc

Similar to Using ScyllaDB for Distribution of Game Assets in Unreal Engine (20)

Ansiblefest 2018 Network automation journey at roblox

Boosting Sitecore Development With Sitecore Docker

XPDDS17: NoXS: Death to the XenStore - Filipe Manco, NEC

Learning to Mod Minecraft: A Father/Daughter Retrospective

W3C HTML5 KIG-The complete guide to building html5 games

DevOpsCon 2015 - DevOps in Mobile Games

Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...

Node in Real Time - The Beginning

[NetherRealm Studios] Game Studio Perforce Architecture

Digital game preservation conference 12 25-2018

DWX 2018 - Automatisiertes Datenbankdeployment im DevOps Prozess

DWX 2018 - Automatisiertes Datenbank-Deployment im DevOps Prozess

NixCon Berlin 2015 - Nix at LogicBlox

Citrix Synergy 2014: Going the CloudPlatform Way

Devoxx : being productive with JHipster

Scalable Clusters On Demand

Enabling Cloud Native Buildpacks for Windows Containers

Data Management and Streaming Strategies in Drakensang Online

Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

More from ScyllaDB

Optimizing NoSQL Performance Through Observability

ScyllaDB

ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor! Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track. This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example: - Common issues getting up and running with the monitoring stack - Using the CQL optimizations dashboard - Common issues causing high latency in a node - Common issues causing replica imbalance - What a healthy system looks like in terms of memory - Key metrics to keep an eye on This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

ScyllaDB

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Low Latency at Extreme Scale: Proven Practices & Pitfalls

ScyllaDB

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

ScyllaDB

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

ScyllaDB

Replacing Your Cache with ScyllaDB

ScyllaDB

Technical risks of putting a cache in front of your database– and what to do instead Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching ia a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of ScyllaDB's specialized row-based cache - Real-world examples of why and how teams eliminated their external cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

ScyllaDB

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

ScyllaDB

Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching can be a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of specialized row-based caches - Real-world examples of why and how teams eliminated their external cache

Getting the most out of ScyllaDB

ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

ScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

ScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

Recently uploaded

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

DevOps and Testing slides at DASA Connect

Kari Kakkonen

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Free Complete Python - A step towards Data Science

RinaMondal9

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Peter Spielvogel

Building better applications for business users with SAP Fiori. • What is SAP Fiori and why it matters to you • How a better user experience drives measurable business benefits • How to get started with SAP Fiori today • How SAP Fiori elements accelerates application development • How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities • How SAP Fiori paves the way for using AI in SAP apps

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity

Monitoring Java Application Security with JDK Tools and JFR Events

Pushing the limits of ePRTC: 100ns holdover for 100 days

National Security Agency - NSA mobile device best practices

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

DevOps and Testing slides at DASA Connect

Generative AI Deep Dive: Advancing from Proof of Concept to Production

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Free Complete Python - A step towards Data Science

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Essentials of Automations: The Art of Triggers and Actions in FME

FIDO Alliance Osaka Seminar: Overview.pdf

Elevating Tactical DDD Patterns Through Object Calisthenics

Uni Systems Copilot event_05062024_C.Vlachos.pdf

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

UiPath Test Automation using UiPath Test Suite series, part 5

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

PCI PIN Basics Webinar from the Controlcase Team

Removing Uninteresting Bytes in Software Fuzzing

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Using ScyllaDB for Distribution of Game Assets in Unreal Engine

1. Using ScyllaDB for Distribution of Game Assets in Unreal Engine Joakim Lindqvist, Senior Tools Programmer

2. Joakim Lindqvist ■ Senior Tools Developer within the Foundation team of Unreal Engine ■ 15 years experience working in games ■ Based out of Stockholm, Sweden

3. ■ The complexity of building games ■ Overview of Unreal Cloud DDC ■ Future work Agenda

4. The Complexity of Building Games

5. Games are Complicated to Make ■ 2 major types of content, source code and game assets ■ Large assets ■ Large teams work on them ■ Specialized skill - spread out across the world

7. The Process of Cooking Data ■ Cooking is data transformation ■ Historically cached on network ﬁle systems within a oﬃce ■ Geo distribution ■ Work From Home ■ Headache to scale storage ■ Can take hours to start up the editor without caching

8. Unreal Cloud DDC

9. Cached Objects ■ Custom self describing binary format called Compact Binary ■ Structured object with attachments that are content addressed ■ Content addressing ■ Deduplication ■ Avoids replicating the same content { “name”: “largeFile”, “size” : 2480, “attachment”: 9fffabc5e0a…1f084f8c5e }

10. ■ Kubernetes ■ 5 regions ■ 7 TB local nvme cache ■ ~50 TB S3 storage ■ 10-500 users per region ■ Last access tracking Unreal Cloud DDC

11. Architecture ScyllaDB (Metadata) Web api Local NVME AWS S3 (Blob Storage) ScyllaDB (Metadata) Web api AWS S3 (Blob Storage)

12. Why ScyllaDB ■ Performance sensitive workload ■ Geo replicated database ■ Eventual consistency works well with our workload ■ Small deployment (2-3 nodes of i4i.xlarge/i4i.2xlarge) in 5 DCs ■ ~3000 req/s per node

13. Future Work

14. Future Work ■ Small blob storage in ScyllaDB ■ ScyllaDB on Azure ■ Performance improvements on the API ■ Extending to more workloads

15. ■ Source Code: https://github.com/EpicGames Unreal Engine

16. Thank You Stay in Touch Joakim Lindqvist joakim.lindqvist@epicgames.com twitter.com/J0CC0 github.com/jocco www.linkedin.com/in/jocco/

Using ScyllaDB for Distribution of Game Assets in Unreal Engine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using ScyllaDB for Distribution of Game Assets in Unreal Engine

Similar to Using ScyllaDB for Distribution of Game Assets in Unreal Engine (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Using ScyllaDB for Distribution of Game Assets in Unreal Engine