SlideShare a Scribd company logo
1 of 47
Download to read offline
Migrating from PostgreSQL to MySQL
at Cocolog
Naoto Yokoyama, NIFTY Corporation
Garth Webb, Six Apart
Lisa Phillips, Six Apart
Credits:
Kenji Hirohama, Sumisho Computer Systems Corp.
Agenda
 1. What is Cocolog
 2. History of Cocolog
 3. DBP: Database Partitioning
 4. Migration From PostgreSQL to MySQL
1. What is Cocolog
What is Cocolog
 NIFTY Corporation
 Established in 1986
 A Fujitsu Group Company
 NIFTY-Serve (licensed and interconnected with CompuServe)
 One of the largest ISPs in Japan
 Cocolog
 First blog community at a Japanese ISP
 Based on TypePad technology by SixApart
 Several hundred million PV/month
 History
 Dec/02/2003: Cocolog for ISP users launch
 Nov/24/2005: Cocolog Free for free launch
 April/05/2007: Cocolog for Mobile Phone launch
2008/04
700 Thousand Users
Cocolog (Screenshot of home page)
Cocolog (Screenshot of home page)
TypePadCocolog
Cocolog template sets
Cocolog Growth (User) ■Cocolog ■Cocolog Free
phase1 phase2 phase3 phase4
Cocolog Growth (Entry) ■Cocolog ■Cocolog Free
phase1 phase2 phase3 phase4
Technology at Cocolog
 Core System
 Linux 2.4/2.6
 Apache 1.3/2.0/2.2 & mod_perl
 Perl 5.8+CPAN
 PostgreSQL 8.1
 MySQL 5.0
 memcached/TheSchwartz/cfengine
 Eco System
 LAMP,LAPP,Ruby+ActiveRecord, Capistrano
 Etc...
Monitoring
 Management Tool
 Proprietary in-house development with PostgreSQL, PHP,
and Perl
 Monitoring points (order of priority)
 response time of each post
 number of spam comments/trackbacks
 number of comments/trackbacks
 source IP address of spam
 number of entries
 number of comments via mobile devices
 page views via mobile devices
 time of batch completion
 amount of API usage
 bandwidth usage
 DB
 Disk I/O
 Memory and CPU usage
 time of VACUUM analyze
 APP
 number of active processes
 CPU usage
 Memory usage
Hard
DB
Service
APL
2. History of Cocolog
Phase1 2003/12~(Entry: 0.04Million)
Register
Postgre
SQL
NAS
WEB
Static contents
Published
Before DBP
10servers
TypePad
Podcast
Portal
Profile
Etc..
Phase2 2004/12~ (Entry: 7Million)
Rich templatePublish Book
Tel Operator
Support
NAS
WEB
Static contents
Published
Postgre
SQL
Register
TypePad2004/12~
2005/5~
Before DBP
50servers
Phase2 - Problems
 The system is tightly coupled.
 Database server is receiving from multiple
points.
 It is difficult to change the system design and
database schema.
Phase3 2006/3~ (Entry: 12Million)
NAS
WEB
Static contents
Published
Web-API
memcached
Podcast
Portal
Profile
Etc..
Postgre
SQL
Rich templatePublish Book
Tel Operator
Support
Register
TypePad
Before DBP
200servers
Phase4 2007/4~ (Entry: 16Million)
Web-API
Static contents
Published
memcached
Atom
Mobile
WEB
Rich templatePublish Book
Tel Operator
Support
Register
Typepad
Postgre
SQL
Before DBP
300servers
Now 2008/4~
Web-API
Static contents
Published
memcached
Atom
Mobile
WEB
Typepad
Rich templatePublish Book
Tel Operator
Support
Register
Multi
MySQL
After DBP
150servers
3. TypePad Database Partitioning
Steps for Transitioning
• Server Preparation
Hardware and software setup
• Global Write
Write user information to the global DB
• Global Read
Read/write user information on the global DB
• Move Sequence
Table sequences served by global DB
• User Data Move
Move user data to user partitions
• New User Partition
All new users saved directly to user partition 1
• New User Strategy
Decide on a strategy for the new user partition
• Non User Data Move
Move all non-user owned data
Storage
TypePad Overview (PreDBP)
Database
(Postgres)
Static Content (HTML,
Images, etc)
Application
Server
Web
Server
TypeCast
Server
ATOM
Server
MEMCACHED
Data Caching servers to
reduce DB load
Dedicated Server for
TypeCast (via ATOM)
https(443)
http(80)
http(80) : atom apimemcached(11211)
postgres(5432)
Mail
Server
Internet
nfs(2049)
ADMIN(CRON)
Server
smtp(25) / pop(110)Blog Readers
Blog Owners
Mobile Blog
Readers
smtp(25) / pop(110)
Cron Server for periodic
asynchronous tasks
TypePad
TypePad
TypePad
Non-
User Role
Why Partition?
TypePad
User Role
(User0)
All inquires (access) go to one
DB(Postgres)
After DBPCurrent setup
Inquiries (access) are divided among
several DB(MySQL)
TypePad
TypePad
TypePad
TypePad
Global
Role
Non-User
Role
User Role
(User1)
User Role
(User2)
User Role
(User3)
Non-
User Role
Server Preparation
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
New expanded setup
DB(MySQL) for partitioned data
Current Setup
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
User information is
partitioned
Maintains user mapping
and primary key generation Stores job
details
Server for
executing Jobs
※Grey areas are not used in current
steps
Asynchronous Job Server
Information that does not
need to be partitioned
(such as session
information)
Global Write
Creating the user map
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
①
②
Explanation
①:For new registrations only, uniquely identifying user data is written to the global DB
②:This same data continues to be written to the existing DB
DB(MySQL) for partitioned data
Asynchronous Job Server
Maintains user mapping
and primary key generation
※Grey areas are not used in current steps
Global Read
Use the user map to find the user partition
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:Migrate existing user data to the global DB
②:At start of the request, the application queries global DB for the location of user data
③:The application then talks to this DB for all queries about this user. At this stage the global DB points
to the user0 partition in all cases.
DB(MySQL) for partitioned data
Maintains user mapping
and primary key generation
①
Migrate existing
user data
Asynchronous Job Server
②
③
※Grey areas are not used in current steps
Move Sequence
Migrating primary key generation
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that
act as “pseudo-sequences”.
② Application requests new primary keys from global DB rather than the user partition.
DB(MySQL) for partitioned data
Maintains user mapping
and primary key generation
①
※Grey areas are not used in current steps
Migrate sequence
management
Asynchronous Job Server
②
User Data Move
Moving user data to the new user-role partitions
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is
then migrated asynchronously
②:If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later.
③:After being migrated all user data will exist on the user-role DB partitions
④:Once all user data is migrated, only non-user data is on Postgres
DB(MySQL) for partitioned data
Stores job
details
Server for
executing Jobs
Maintains user mapping
and primary key generation
User information is
partitioned
①
②
※Grey areas are not used in current steps
③
Migrating each
user data
DB(MySQL) for partitioned data
④
New User Partition
New registrations are created on one user role partition
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:When new users register, user data is written to a user role partition.
②:Non-user data continues to be served off Postgres
DB(MySQL) for partitioned data
Maintains user mapping
and primary key generation
User information is
partitioned
①
②
※Grey areas are not used in current steps
Asynchronous Job Server
New User Strategy
Pick a scheme for distributing new users
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:When new users register, user data is written to one of the user role partitions, depending on a set
distribution method (round robin, random, etc)
②:Non-user data continues to be served off Postgres
DB(MySQL) for partitioned data
Maintains user mapping
and primary key generation
User information is
partitioned
①
②
※Grey areas are not used in current steps
Asynchronous Job Server
Non User Data Move
Migrate data that cannot be partitioned by user
Non-
User Role
TypePad
User Role
(User0)
DB(PostgreSQL)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:Migrate non-user role data left on PostgreSQL to the MySQL side.
DB(MySQL) for partitioned data
Maintains user mapping
and primary key generation
User information is
partitioned
①
※Grey areas are not used in current steps
Migrate non-User
data
Asynchronous Job Server
Information that does not
need to be partitioned
(such as session
information)
Data migration done
Non-
User Role
TypePad
User Role
(User0)
DB(Postgres)
User Role
(User1)
User Role
(User2)
User Role
(User3)
Global
Role
Non-User
Role
Job Server
+ TypePad
+ Schwartz
Schwartz
DB
Explanation
①:All data access is now done through MySQL
②:Continue to use The Schwartz for asynchronous jobs
DB(MySQL) for partitioned data
Stores job
details
Server for
executing Jobs
Maintains user mapping
and primary key generation
User information is
partitioned
①
※Grey areas are not used in current steps
①
② Asynchronous Job Server
Information that does not
need to be partitioned
(such as session
information)
Storage
The New TypePad configuration
Database
(MySQL)
Static Content
(HTML,
Images, etc)
Application
Server
Web
Server
TypeCast
Server
ATOM
Server
MEMCACHED
Data Caching servers to
reduce DB load
Dedicated Server for
TypeCast (via ATOM)
https(443)
http(80)
http(80) : atom api
memcached(11211)
MySQL(3306)
Mail
Server
Internet
nfs(2049)
ADMIN(CRON)
Server
smtp(25) / pop(110)
Blog Readers
Blog Owners
(management
interface)
Mobile Blog
Readers
smtp(25) / pop(110)
Cron Server for periodic
asynchronous tasks
Job
Server
TheSchwartz server for
running ad-hoc jobs
asynchronously
4. Migration from PostgreSQL to MySQL
 DB Node Spec History
Time OS(RedHat) CPU Xeon MEM DiskArray
2003/12
2007/11
7.4(2.4.9) 1.8GHz/512k×1 1GB No
ES2.1(2.4.9) 3.2GHz/1M×2 4GB No
ES2.1(2.4.9) 3.2GHz/1M×2 4GB Yes
AS2.1(2.4.9) 3.2GHz/1M×4 12G
B
Yes
AS4 (2.6.9) 3.2GHz/1M×4 12G
B
Yes
AS4 (2.6.9) MP3.3GHz/1M×4
〔2Core×4〕
16G
B
Yes
History of scale up PostgreSQL server, Before DBP
 DB DiskArray Spec
 [FUJITSU ETERNUS8000]
 Best I/O transaction performance in the world
 146GB (15 krpm) * 32disk with RAID - 10
 MultiPath FibreChannel 4Gbps
 QuickOPC (One Point Copy)
 OPC copy functions let you create a duplicate copy
of any data from the original at any chosen time.
http://www.computers.us.fujitsu.com/www/pro
ducts_storage.shtml?products/storage/fujitsu/
e8000/e8000
History of scale up PostgreSQL server, Before DBP
Scale out MySQL servers, After DBP
 A role configuration
 Each role is configured as HA cluster
 HA Software: NEC ClusterPro
 Shared Storage
Scale out MySQL servers, After DBP
Postgre
SQL
FibreChannel SAN
DiskArray
…
heart beat
MySQL
Role3
MySQL
Role2
MySQL
Role1
TypePad
Application
Scale out MySQL servers, After DBP
 Backup
 Replication w/ Hot backup
Scale out MySQL servers, After DBP
Postgre
SQL
FibreChannel SAN
DiskArray
…
heart beat
MySQL
Role3
MySQL
Role2
MySQL
Role1
MySQL
BackupRole
TypePad
Application
mysqld mysqld mysqld
rep rep rep
opc
mysqld
mysqld
mysqld
Troubles with PostreSQL 7.4 – 8.1
 Data size
 over 100 GB
 40% is index
 Severe Data Fragmentation
 VACUUM
 “VACUUM analyze” cause the performance problem
 Takes too long to VACUUM large amounts of data
 dump/restore is the only solution for de-fragmentation
 Auto VACUUM
 We don’t use Auto VACUUM since we are worried about
latent response time
Troubles with PostgreSQL 7.4 – 8.1
 Character set
 PostgreSQL allow the out of boundary UTF-8
Japanese extended character sets and multi
bytes character sets which normally should
come back with an error - instead of
accepting them.
“Cleaning” data
 Removing characters set that are out of the
boundries UTF-8 character sets.
 Steps
 PostgreSQL.dumpALL
 Split for Piconv
 UTF8 -> UCS2 -> UTF8 & Merge
 PostgreSQL.restore
dump Split UTF8->UCS2->UTF8 Mergerestore
TypePadTypePad
Migration from PostgreSQL to MySQL using TypePad script
 Steps
 PostgreSQL -> PerlObject & tmp publish
-> MySQL -> PerlObject & last publish
 diff tmp & last Object (data check)
 diff tmp & last publish (file check)
PostgreSQL
Document
Object
tmp
Document
Object
last
File check
data check
Troubles with MySQL
 convert_tz function
 doesn't support the input value outside the
scope of Unix Time
 sort order
 different sort order without “order by” clause
Cocolog Future Plans
 Dynamic
 Job queue
Consulting by
 Sumisho Computer Systems Corp.
 System Integrator
 first and best partner of MySQL in Japan
since 2003
 provide MySQL consulting, support, training
service
 HA
 Maintenance
 online backup
 Japanese character support
Questions

More Related Content

Similar to Migrating from PostgreSQL to MySQL at Cocolog

State of GeoServer
State of GeoServerState of GeoServer
State of GeoServerJody Garnett
 
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Oracle applications r12.2, ebr, online patching   means lot of work for devel...Oracle applications r12.2, ebr, online patching   means lot of work for devel...
Oracle applications r12.2, ebr, online patching means lot of work for devel...Ajith Narayanan
 
State of GeoServer 2.12
State of GeoServer 2.12State of GeoServer 2.12
State of GeoServer 2.12GeoSolutions
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database DeploymentsMike Willbanks
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallssam2sung2
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM
 
State of GeoServer 2.13
State of GeoServer 2.13State of GeoServer 2.13
State of GeoServer 2.13Jody Garnett
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...Daniel Martin
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLPGConf APAC
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBA
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBADatabase Migration using Oracle SQL Developer: DBA Stuff for the Non-DBA
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBADanny Bryant
 
An Advanced Approach Of Active Directory Techniques
An Advanced Approach Of Active Directory TechniquesAn Advanced Approach Of Active Directory Techniques
An Advanced Approach Of Active Directory TechniquesTodd Turner
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS BackendLaurent Cerveau
 

Similar to Migrating from PostgreSQL to MySQL at Cocolog (20)

State of GeoServer
State of GeoServerState of GeoServer
State of GeoServer
 
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Oracle applications r12.2, ebr, online patching   means lot of work for devel...Oracle applications r12.2, ebr, online patching   means lot of work for devel...
Oracle applications r12.2, ebr, online patching means lot of work for devel...
 
State of GeoServer 2.12
State of GeoServer 2.12State of GeoServer 2.12
State of GeoServer 2.12
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database Deployments
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo Project
 
State of GeoServer 2.13
State of GeoServer 2.13State of GeoServer 2.13
State of GeoServer 2.13
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Meet with Meteor
Meet with MeteorMeet with Meteor
Meet with Meteor
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
VoltDB.ppt
VoltDB.pptVoltDB.ppt
VoltDB.ppt
 
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBA
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBADatabase Migration using Oracle SQL Developer: DBA Stuff for the Non-DBA
Database Migration using Oracle SQL Developer: DBA Stuff for the Non-DBA
 
An Advanced Approach Of Active Directory Techniques
An Advanced Approach Of Active Directory TechniquesAn Advanced Approach Of Active Directory Techniques
An Advanced Approach Of Active Directory Techniques
 
OSI model
OSI modelOSI model
OSI model
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS Backend
 

Recently uploaded

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Recently uploaded (20)

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Migrating from PostgreSQL to MySQL at Cocolog

  • 1. Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama, Sumisho Computer Systems Corp.
  • 2. Agenda  1. What is Cocolog  2. History of Cocolog  3. DBP: Database Partitioning  4. Migration From PostgreSQL to MySQL
  • 3. 1. What is Cocolog
  • 4. What is Cocolog  NIFTY Corporation  Established in 1986  A Fujitsu Group Company  NIFTY-Serve (licensed and interconnected with CompuServe)  One of the largest ISPs in Japan  Cocolog  First blog community at a Japanese ISP  Based on TypePad technology by SixApart  Several hundred million PV/month  History  Dec/02/2003: Cocolog for ISP users launch  Nov/24/2005: Cocolog Free for free launch  April/05/2007: Cocolog for Mobile Phone launch
  • 5. 2008/04 700 Thousand Users Cocolog (Screenshot of home page)
  • 6. Cocolog (Screenshot of home page) TypePadCocolog
  • 8. Cocolog Growth (User) ■Cocolog ■Cocolog Free phase1 phase2 phase3 phase4
  • 9. Cocolog Growth (Entry) ■Cocolog ■Cocolog Free phase1 phase2 phase3 phase4
  • 10. Technology at Cocolog  Core System  Linux 2.4/2.6  Apache 1.3/2.0/2.2 & mod_perl  Perl 5.8+CPAN  PostgreSQL 8.1  MySQL 5.0  memcached/TheSchwartz/cfengine  Eco System  LAMP,LAPP,Ruby+ActiveRecord, Capistrano  Etc...
  • 11. Monitoring  Management Tool  Proprietary in-house development with PostgreSQL, PHP, and Perl  Monitoring points (order of priority)  response time of each post  number of spam comments/trackbacks  number of comments/trackbacks  source IP address of spam  number of entries  number of comments via mobile devices  page views via mobile devices  time of batch completion  amount of API usage  bandwidth usage  DB  Disk I/O  Memory and CPU usage  time of VACUUM analyze  APP  number of active processes  CPU usage  Memory usage Hard DB Service APL
  • 12. 2. History of Cocolog
  • 13. Phase1 2003/12~(Entry: 0.04Million) Register Postgre SQL NAS WEB Static contents Published Before DBP 10servers TypePad
  • 14. Podcast Portal Profile Etc.. Phase2 2004/12~ (Entry: 7Million) Rich templatePublish Book Tel Operator Support NAS WEB Static contents Published Postgre SQL Register TypePad2004/12~ 2005/5~ Before DBP 50servers
  • 15. Phase2 - Problems  The system is tightly coupled.  Database server is receiving from multiple points.  It is difficult to change the system design and database schema.
  • 16. Phase3 2006/3~ (Entry: 12Million) NAS WEB Static contents Published Web-API memcached Podcast Portal Profile Etc.. Postgre SQL Rich templatePublish Book Tel Operator Support Register TypePad Before DBP 200servers
  • 17. Phase4 2007/4~ (Entry: 16Million) Web-API Static contents Published memcached Atom Mobile WEB Rich templatePublish Book Tel Operator Support Register Typepad Postgre SQL Before DBP 300servers
  • 18. Now 2008/4~ Web-API Static contents Published memcached Atom Mobile WEB Typepad Rich templatePublish Book Tel Operator Support Register Multi MySQL After DBP 150servers
  • 19. 3. TypePad Database Partitioning
  • 20. Steps for Transitioning • Server Preparation Hardware and software setup • Global Write Write user information to the global DB • Global Read Read/write user information on the global DB • Move Sequence Table sequences served by global DB • User Data Move Move user data to user partitions • New User Partition All new users saved directly to user partition 1 • New User Strategy Decide on a strategy for the new user partition • Non User Data Move Move all non-user owned data
  • 21. Storage TypePad Overview (PreDBP) Database (Postgres) Static Content (HTML, Images, etc) Application Server Web Server TypeCast Server ATOM Server MEMCACHED Data Caching servers to reduce DB load Dedicated Server for TypeCast (via ATOM) https(443) http(80) http(80) : atom apimemcached(11211) postgres(5432) Mail Server Internet nfs(2049) ADMIN(CRON) Server smtp(25) / pop(110)Blog Readers Blog Owners Mobile Blog Readers smtp(25) / pop(110) Cron Server for periodic asynchronous tasks
  • 22. TypePad TypePad TypePad Non- User Role Why Partition? TypePad User Role (User0) All inquires (access) go to one DB(Postgres) After DBPCurrent setup Inquiries (access) are divided among several DB(MySQL) TypePad TypePad TypePad TypePad Global Role Non-User Role User Role (User1) User Role (User2) User Role (User3)
  • 23. Non- User Role Server Preparation TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role New expanded setup DB(MySQL) for partitioned data Current Setup Job Server + TypePad + Schwartz Schwartz DB User information is partitioned Maintains user mapping and primary key generation Stores job details Server for executing Jobs ※Grey areas are not used in current steps Asynchronous Job Server Information that does not need to be partitioned (such as session information)
  • 24. Global Write Creating the user map Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB ① ② Explanation ①:For new registrations only, uniquely identifying user data is written to the global DB ②:This same data continues to be written to the existing DB DB(MySQL) for partitioned data Asynchronous Job Server Maintains user mapping and primary key generation ※Grey areas are not used in current steps
  • 25. Global Read Use the user map to find the user partition Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:Migrate existing user data to the global DB ②:At start of the request, the application queries global DB for the location of user data ③:The application then talks to this DB for all queries about this user. At this stage the global DB points to the user0 partition in all cases. DB(MySQL) for partitioned data Maintains user mapping and primary key generation ① Migrate existing user data Asynchronous Job Server ② ③ ※Grey areas are not used in current steps
  • 26. Move Sequence Migrating primary key generation Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as “pseudo-sequences”. ② Application requests new primary keys from global DB rather than the user partition. DB(MySQL) for partitioned data Maintains user mapping and primary key generation ① ※Grey areas are not used in current steps Migrate sequence management Asynchronous Job Server ②
  • 27. User Data Move Moving user data to the new user-role partitions Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously ②:If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. ③:After being migrated all user data will exist on the user-role DB partitions ④:Once all user data is migrated, only non-user data is on Postgres DB(MySQL) for partitioned data Stores job details Server for executing Jobs Maintains user mapping and primary key generation User information is partitioned ① ② ※Grey areas are not used in current steps ③ Migrating each user data DB(MySQL) for partitioned data ④
  • 28. New User Partition New registrations are created on one user role partition Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:When new users register, user data is written to a user role partition. ②:Non-user data continues to be served off Postgres DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned ① ② ※Grey areas are not used in current steps Asynchronous Job Server
  • 29. New User Strategy Pick a scheme for distributing new users Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) ②:Non-user data continues to be served off Postgres DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned ① ② ※Grey areas are not used in current steps Asynchronous Job Server
  • 30. Non User Data Move Migrate data that cannot be partitioned by user Non- User Role TypePad User Role (User0) DB(PostgreSQL) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:Migrate non-user role data left on PostgreSQL to the MySQL side. DB(MySQL) for partitioned data Maintains user mapping and primary key generation User information is partitioned ① ※Grey areas are not used in current steps Migrate non-User data Asynchronous Job Server Information that does not need to be partitioned (such as session information)
  • 31. Data migration done Non- User Role TypePad User Role (User0) DB(Postgres) User Role (User1) User Role (User2) User Role (User3) Global Role Non-User Role Job Server + TypePad + Schwartz Schwartz DB Explanation ①:All data access is now done through MySQL ②:Continue to use The Schwartz for asynchronous jobs DB(MySQL) for partitioned data Stores job details Server for executing Jobs Maintains user mapping and primary key generation User information is partitioned ① ※Grey areas are not used in current steps ① ② Asynchronous Job Server Information that does not need to be partitioned (such as session information)
  • 32. Storage The New TypePad configuration Database (MySQL) Static Content (HTML, Images, etc) Application Server Web Server TypeCast Server ATOM Server MEMCACHED Data Caching servers to reduce DB load Dedicated Server for TypeCast (via ATOM) https(443) http(80) http(80) : atom api memcached(11211) MySQL(3306) Mail Server Internet nfs(2049) ADMIN(CRON) Server smtp(25) / pop(110) Blog Readers Blog Owners (management interface) Mobile Blog Readers smtp(25) / pop(110) Cron Server for periodic asynchronous tasks Job Server TheSchwartz server for running ad-hoc jobs asynchronously
  • 33. 4. Migration from PostgreSQL to MySQL
  • 34.  DB Node Spec History Time OS(RedHat) CPU Xeon MEM DiskArray 2003/12 2007/11 7.4(2.4.9) 1.8GHz/512k×1 1GB No ES2.1(2.4.9) 3.2GHz/1M×2 4GB No ES2.1(2.4.9) 3.2GHz/1M×2 4GB Yes AS2.1(2.4.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9) 3.2GHz/1M×4 12G B Yes AS4 (2.6.9) MP3.3GHz/1M×4 〔2Core×4〕 16G B Yes History of scale up PostgreSQL server, Before DBP
  • 35.  DB DiskArray Spec  [FUJITSU ETERNUS8000]  Best I/O transaction performance in the world  146GB (15 krpm) * 32disk with RAID - 10  MultiPath FibreChannel 4Gbps  QuickOPC (One Point Copy)  OPC copy functions let you create a duplicate copy of any data from the original at any chosen time. http://www.computers.us.fujitsu.com/www/pro ducts_storage.shtml?products/storage/fujitsu/ e8000/e8000 History of scale up PostgreSQL server, Before DBP
  • 36. Scale out MySQL servers, After DBP  A role configuration  Each role is configured as HA cluster  HA Software: NEC ClusterPro  Shared Storage
  • 37. Scale out MySQL servers, After DBP Postgre SQL FibreChannel SAN DiskArray … heart beat MySQL Role3 MySQL Role2 MySQL Role1 TypePad Application
  • 38. Scale out MySQL servers, After DBP  Backup  Replication w/ Hot backup
  • 39. Scale out MySQL servers, After DBP Postgre SQL FibreChannel SAN DiskArray … heart beat MySQL Role3 MySQL Role2 MySQL Role1 MySQL BackupRole TypePad Application mysqld mysqld mysqld rep rep rep opc mysqld mysqld mysqld
  • 40. Troubles with PostreSQL 7.4 – 8.1  Data size  over 100 GB  40% is index  Severe Data Fragmentation  VACUUM  “VACUUM analyze” cause the performance problem  Takes too long to VACUUM large amounts of data  dump/restore is the only solution for de-fragmentation  Auto VACUUM  We don’t use Auto VACUUM since we are worried about latent response time
  • 41. Troubles with PostgreSQL 7.4 – 8.1  Character set  PostgreSQL allow the out of boundary UTF-8 Japanese extended character sets and multi bytes character sets which normally should come back with an error - instead of accepting them.
  • 42. “Cleaning” data  Removing characters set that are out of the boundries UTF-8 character sets.  Steps  PostgreSQL.dumpALL  Split for Piconv  UTF8 -> UCS2 -> UTF8 & Merge  PostgreSQL.restore dump Split UTF8->UCS2->UTF8 Mergerestore
  • 43. TypePadTypePad Migration from PostgreSQL to MySQL using TypePad script  Steps  PostgreSQL -> PerlObject & tmp publish -> MySQL -> PerlObject & last publish  diff tmp & last Object (data check)  diff tmp & last publish (file check) PostgreSQL Document Object tmp Document Object last File check data check
  • 44. Troubles with MySQL  convert_tz function  doesn't support the input value outside the scope of Unix Time  sort order  different sort order without “order by” clause
  • 45. Cocolog Future Plans  Dynamic  Job queue
  • 46. Consulting by  Sumisho Computer Systems Corp.  System Integrator  first and best partner of MySQL in Japan since 2003  provide MySQL consulting, support, training service  HA  Maintenance  online backup  Japanese character support