Ryosuke Iwanga gave a presentation about fighting big data at Mobage DBA. He discussed how Mobage grew from 600 million page views per day in 2006-2008 to over 2 billion page views per day with the introduction of social games in 2009-2010. He defined big data as having over 500 database servers, storing over 100TB of data, and handling over 1 million queries per second at peak times. He described techniques used at Mobage to scale out databases including replication, sharding, partitioning, and optimizing queries. He also discussed strategies for ensuring high availability, performing backups, and purging old data.
Roll Your Own API Management Platform with nginx and LuaJon Moore
We recently replaced a proprietary API management solution with an in-house implementation built with nginx and Lua that let us get to a continuous delivery practice in a handful of months. Learn about our development process and the overall architecture that allowed us to write minimal amounts of code, enjoying native code performance while permitting interactive codeing, and how we leveraged other open source tools like Vagrant, Ansible, and OpenStack to build an automation-rich delivery pipeline. We will also take an in-depth look at our capacity management approach that differs from the rate limiting concept prevalent in the API community.
Instruction in Building Tile Server and Providing Map Editor for Crowdsourcing, Build Your Own Map by Yourself!
References:
1. https://switch2osm.org/
2. https://github.com/openstreetmap/openstreetmap-website
3. http://wiki.openstreetmap.org/ OpenStreetMap Wikipedia
This slide is for:
1. Who has map data and wants to provide tile servers
2. Who wants to edit map by using editors such as iD,
Potlatch
3. Who is interested in building a local OpenStreetMap
Slides for my talk at the London Perl Workshop in Nov 2013, featuring the Devel::SizeMe perl module.
See also the screencast at https://archive.org/details/Perl-Memory-Profiling-LPW2013
Percona Toolkit for Effective MySQL AdministrationMydbops
Percona Tools are one of most widely tool in MySQL industry. It is used for the effective MySQL administartion and handling complex operational tasks in MySQL.
Slides for my talk at SkyCon'12 in Limerick.
Here I've squeezed four talks into one, covering a lot of ground quickly, so I've included links to more detailed presentations and other resources.
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Ontico
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Roll Your Own API Management Platform with nginx and LuaJon Moore
We recently replaced a proprietary API management solution with an in-house implementation built with nginx and Lua that let us get to a continuous delivery practice in a handful of months. Learn about our development process and the overall architecture that allowed us to write minimal amounts of code, enjoying native code performance while permitting interactive codeing, and how we leveraged other open source tools like Vagrant, Ansible, and OpenStack to build an automation-rich delivery pipeline. We will also take an in-depth look at our capacity management approach that differs from the rate limiting concept prevalent in the API community.
Instruction in Building Tile Server and Providing Map Editor for Crowdsourcing, Build Your Own Map by Yourself!
References:
1. https://switch2osm.org/
2. https://github.com/openstreetmap/openstreetmap-website
3. http://wiki.openstreetmap.org/ OpenStreetMap Wikipedia
This slide is for:
1. Who has map data and wants to provide tile servers
2. Who wants to edit map by using editors such as iD,
Potlatch
3. Who is interested in building a local OpenStreetMap
Slides for my talk at the London Perl Workshop in Nov 2013, featuring the Devel::SizeMe perl module.
See also the screencast at https://archive.org/details/Perl-Memory-Profiling-LPW2013
Percona Toolkit for Effective MySQL AdministrationMydbops
Percona Tools are one of most widely tool in MySQL industry. It is used for the effective MySQL administartion and handling complex operational tasks in MySQL.
Slides for my talk at SkyCon'12 in Limerick.
Here I've squeezed four talks into one, covering a lot of ground quickly, so I've included links to more detailed presentations and other resources.
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Ontico
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
Data-Intensive Computing for Text Analysis CS395T / INF385T / LIN386M
University of Texas at Austin, Fall 2011
Lecture 2 September 1, 2011
Jason Baldridge and Matt Lease
https://sites.google.com/a/utcompling.com/dicta-f11/
Solving performance problems in MySQL without denormalizationdmcfarlane
As operational database schemas become complex, users resort to denormalization to handle performance issues. This includes a range of techniques from materialized views to using MySQL as a key-value store for blobs containing full objects. While denormalization solves immediate bottlenecks, it comes at a hefty price. In this presentation Ari will explore common denormalization approaches and tradeoffs using real world examples. He will then present a solution under development at Akiban Technologies to alleviate these same problems much more efficiently, and allow users to get the best of both worlds.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
MySQL replication is the backbone of the web economy, but it has shortcomings. Tungtsten Replicator, an open source replication engine, takes MySQL replication to the next level with multiple masters, seamless failover, parallel replication.
This presentation can help you to apply partioning when appropriate, and to avoid problems when using it. The oneliner is: Simple Works Best. The illustrating demos are on Postgres12 (maybe -13 by the time of presenting) and show some of the problems and solutions that Partitioning can provide. Some of this “experience” is quite old and the demo runs near-identical on Oracle…
These problems are the same on any database.
MySQL Replication Update -- Zendcon 2016Dave Stokes
How does MySQL work at a conceptual level and at a how-to-do-it level is covered in this presentation plus information on other replication options like Group Replication and Multi Master
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward
Stateful stream processing with exactly-once guarantees is one of Apache Flink's distinctive features and we have observed that the scale of state that is managed by Flink in production is constantly growing. This development created new challenges for state management in Flink, in particular for state checkpointing, which is the core of Flink's fault tolerance mechanism. Two of the most important problems that we had to solve were the following: (i) how can we limit the duration and size of checkpoints to something that does not grow linearly in the size of the state and (ii) how can we take checkpoints without blocking the processing pipeline in the meantime? We have implemented incremental checkpoints to solve the first problem by checkpointing only the changes between checkpoints, instead of always recording the whole state. Asynchronous checkpoints address the second problem and enable Flink to continue processing concurrently to running checkpoints. In this talk, we will take a deep dive into the details of Flink's new checkpointing features. In particular, we will talk about the underlying datastructures, log-structured merge trees and copy-on-write hash tables, and how those building blocks are assembled and orchestrated to advance Flink's checkpointing.
Much has changed in the MySQL world over the past few years with it being first bought by Sun and then gobbled by Oracle. So is it going to be sucked of oxygen or are Oracle serious about keeping MySQL popular and open?
The good news is that despite going quiet for a long while (one releases in 4+ years) it looks like Oracle have shown some love and rolled out significant changes and welcome improvements that improve the MySQL's overall maturity and performance.
This talk will walk through practical examples that demonstrate how these features can be best used.
Topics include:
With InnoDB being chosen over MyISAM as the default storage engine we'll explore the pros & cons of these and other table types.
A key to high availability is redundancy, so replication is vital. This talk will walk through real-world examples ranging from simple master-slave setups to more complex multi-master and multi-slave configurations.
Now that you have multiple servers up & running the next logical step is a look at the load balancing and failover features built into the latest JDBC drivers.
To round things out we'll examine options for backing up your mysql data and check out some of the new monitoring tools Oracle are providing as enterprise (i.e. non-free) add-ons.
Paper_Scalable database logging for multicoresHyo jeong Lee
Presentation for following paper:
Jung, Hyungsoo, Hyuck Han, and Sooyong Kang. "Scalable database logging for multicores." Proceedings of the VLDB Endowment 11.2 (2017): 135-148.
Similar to "Mobage DBA Fight against Big Data" - NHN TE (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. We have been fighting
against “Big Data”!
2006 - 2008
Mobage grew up to 600M PV/day
2009 - 2010
Started SocialGame ->2000M PV/day
2011 -
Globalization
7. What is “Big Data”?
Many Servers?
+500 Database Servers
8. What is “Big Data”?
Many Servers?
+500 Database Servers
Large Data Size?
9. What is “Big Data”?
Many Servers?
+500 Database Servers
Large Data Size?
+100TB InnoDB(include replicas)
10. What is “Big Data”?
Many Servers?
+500 Database Servers
Large Data Size?
+100TB InnoDB(include replicas)
High Traffic?
11. What is “Big Data”?
Many Servers?
+500 Database Servers
Large Data Size?
+100TB InnoDB(include replicas)
High Traffic?
+1Mqps to all MySQL at peak time
13. Scale out - Replication
Master - Slaves Architecture
critical SELECT -> master
used by many HTTP states
used by UPDATE(lock)
non critical select -> slaves
used by only showing information
easy to scale out
14. INSERT/DELETE/UPDATE App
App
App
critical SELECT
non critical
DB SELECT
Master
a l b le
a DB
s
replication
cDB Slave
Slave
15. Scale out - Sharding
At first -> 1 database - all tables
Sharding 1
divide per tables
Sharding 2
divide per records
mapping table / hashing
Make new DB series using replication
before sharding
20. Scale out - auto increment
AUTO INCREMENT usually can’t be used
same schema - different database
MyISAM sequence table
UPDATE seq SET id=LAST_INSERT_ID(id+1)
SELECT LAST_INSERT_ID()
not affect other InnoDB transaction
21. Scale back - Multi instances
Reduce scale-outed servers
spec up server / service get smaller
MySQL repli don’t support multi masters
Run multi mysqld instances on 1 server
bind different virtual IP addresses
nothing change on app servers
22. DB(1) DB(2)
Master Master
multiple master
replicattion
DB new
Master
23. DB(1) DB(2)
Master Master
192.168.10.1 192.168.10.2
DB(1) new
DB DB(2)
mysqld mysqld
Master
my.cnf my.cnf
bind-address=xxx bind-address=yyy
24. Data Availability - backup
using non-service slave as “backup”
the same spec as master
daily logical backup
easy to add new slave
use daily backup and binary log
new schema slave
ALTER before importing data
25. AM 3:00
DB 1 mysqldump
position
Backup Anytime add slave 2
3
DB
DB Slave
Master
DB
DB Slave
Slave
26. Data Availability - MHA
MHA - MySQL Master High Availability
Change master when master failed
prevent split brain / do IP failover
Online schema change
change slaves/backup schema offline
backup(new schema) -> new master
master -> new backup and change
schema
28. Purge
Many records make DB performance bad
Especially range scanning
Cause storage capacity problem
Purging unnecessary records is important
ex.) old messages, logs
Usually using timestamp column
29. Purge - Before MySQL 5.1
purge records using DELETE
Range DELETE is too heavy
ex.) DELETE ... WHERE t <= ...
Using Primary Key
pre-selected from backup
needs index of time column
adjust speed watching slave repli delay
and stop at peak time
30. Purge - After MySQL 5.1
Supported RANGE partitioning
DROP PARTITION ≒ DROP TABLE
Partition pruning
reduce useless scanning
Need to add timestamp column to all
UNIQUE INDEX
31. id date ...
101 1337191650
CREATE TABLE t ( 102 1337191650
id date
...
103 1337192655
id int not NULL, 104 213 1337192650
1337192660
date int not NULL, 105 214 1337192650
1337192662
215 1337192655
... ...
216 1337192660
PRIMARY KEY (id, date) 217 1337192662
) ENGINE=InnoDB ...
PARTITION BY RANGE(date) (
PARTITION p20120517 VALUES LESS THAN ...,
PARTITION p20120518 VALUES LESS THAN ...,
...
PARTITION over VALUES LESS THAN MAXVALUE
)
33. SELECT range scan
Range scan using Index
Index is important for Big Data
InnoDB index = B+ Tree
Primary Key = Data
search PK = access data
WHERE a = 1 AND b = 2 ORDER BY c
KEY i1 (a, b, c)
Covering Index
34. Data leaf block
SELECT * FROM t
PK Col1 Col2 ...
WHERE Col1 = 1 100
AND Col2 = 'b' 101
AND Col3 >= 6 102
103
104
Index leaf block ...
Col1 Col2 Col3 PK
1 a 3 10
1 b 2 400
1 b 6 103
2 a 1 201
2 c 5 9
...
35. SELECT Primary Key
many queries use WHERE PK = ...
like Key-Value Store
Handler Socket plugin (made by Higuchi)
skip SQL parse phase
use Handler interface directly
higher performance than memcached
without considering cache consistency!
MySQL 5.6 added memcached API
36. SELECT PK,Col1
FROM t P 0 db t PRIMARY PK,Col1
WHERE PK = 101 0 = 1 101
Listener
for libmysql
HandlerSocket
SQL Layer Plugin
PK Col1 ...
Handler Interface 100 a
101 b
102 c
http://engineer.dena.jp/2010/08/handlersocket-plugin-for-mysql.html 103 d
...
37. T many UPDATE
oo
Shorten locking time
Avoid too slow procedures in transactions
Connect before locking
to avoid SIN resending
conn A, conn B and lock A, lock B
kick out from transactions
Connecting other servers like using
HTTP API, memcached
38. UPDATE distributed masters
Distributed Master (Sharding)
MySQL can’t detect Deadlock
-> wait innodb_lockwait_timeout
SocialGame needs many locks of records
sort lock order as much as possible
Optimistic lock(use version column)
raise error when update the record
and it has already updated by others
41. Optimistic Lock
(1)
SELECT * FROM t PK ver Col1 ...
WHERE PK = 101 (no lock) 100 25643
=> ver = 36786 101 36786
102 14624
App ...
(2)
UPDATE t SET Col1 = 100, ver = ver+1
WHERE PK = 101 AND ver = 36786
42. Replication Delay - monitor
MySQL replication is single thread
High CUD qps is bottleneck of slave
difficult to benchmark before
-> monitor delay of backup server
backup is low spec comparing with
slave
able to detect repli delay problems
before it happens on slave
43. Replication Delay - SSD
SSD is quite effective for slave
high IOPS means high throughput of
replication thread
Multi instance have multi repli threads
SATA-SSD is good way for most case
enough cheep / storage capacity
PCIe-SSD is too expensive now
SAS-SSD can’t use full storage capacity
45. DeNA needs
Big Data lovers
We have been fighting “Big Data”
MySQL/Hadoop/App/Network/etc...
There are still many problems ;(
We need more power
We are looking forward to your joining
Please contact riywo / DeNA staff :)