6
1+概念:数据分片Shard
实现shard 需要改变思维(DB设计,App使用)
尽量避免join - 关联查询
数据冗余/ 反范式
例:数据冗余for shard
shard before – comment(id, blog_id, content)
shard after – comment(id, blog_id, content, user_id)
常见shard 策略
垂直分片
按功能分【如:论坛,博客】
水平分片
2 * N 【如定单,购买者与网店各一份】
N / n 【按日期或ID 范围分区】
Hash(N) % n 【按hash 分】
分区表查找法
7.
7
A
P
P
User1
User2
DAL
Proxy
1*概念:数据访问层DAL实现
实现方式
进程内DAL API 接口组件
Java: Hibernate Shard,HiveDB, …
Python: Pyshards
进程外DAL Proxy 服务器
MySQL:MySql/Proxy,Amoeba,Cobar
PgSQL: PL/Proxy (Skype), PgPool-II
考虑因素:
业务场景:业务应用多少?——有很多
功能需求:需求是否固定?——不固定
分片规则:是否一成不变?——不确定
User1
User2
DAL
API
A
P
P
18
3*现状:多库同表的读写分离(问)
类名:Dalet4PSqlid
测试:DalClientsTest2.test_sqlrw_ok()
配置:
在dal_vrdb中配置虚拟数据库(第一为写库,其他为读库)
vdb.testdb = NWR111,testdb,testdb1,testdb2
在dal_psqlid中配置,以REST风格根据GET/PUT操作来区分读写。
GET.sql_select_user1 = SELECT id,other FROM user1 WHERE username like '%lius%'
PUT.sql_update_user1= UPDATE user1 SET `id`=id WHERE username like '%lius%’
PUT.sql_select_user1 = SELECT id,other FROM user1 WHERE username like '%lius%
PUT.psql_select_user1= SELECT id,other FROM user1 WHERE username like '%lius%
测试(使用vdb.testdb而不是testdb)
http://localhost:8081/dal/psqlid2/vdb.testdb/sql_select_user1.txt 读库读(testdb1,testdb2)
http://localhost:8081/dal/psqlid2/vdb.testdb/sql_select_user1.txt?METHOD=PUT 写库写操作
http://localhost:8081/dal/psqlid2/vdb.testdb/sql_update_user1.txt?METHOD=PUT 写库读操作
http://localhost:8081/dal/psqlid2/testdb/sql_select_user1.txt 直接操作写库
http://localhost:8081/dal/psqlid2/testdb1/sql_select_user1.txt 直接操作读库
http://localhost:8081/dal/psqlid2/vdb.testdb/psql_select_user1.txt?METHOD=PUT 预编译
问题:如表名也不同?插入后查询操作间隔小于复制延迟?
19.
19
3 现状:同库多表的查询(问)
类名:Dalet4PSqlid
配置:在dal_psqlid中配置语句。
GET.user_cross0 = select * from user1,user2;
GET.user_union0 = select * from user1 union all select * from user2;
GET.user_cross = select user1.other,user2.other from user1,user2;
GET.user_union =select other from user1 union all select other from user2;
测试:
http://localhost:8081/dal/psqlid2/testdb/user_cross.txt
http://localhost:8081/dal/psqlid2/testdb/user_union.txt
问题:
上面两个查询语句的区别?
#2 【修改历史】
Version 5.0.20130606 初期版本。
Version 5.1.20130608 +写库读操作修改测试验证。+脚本语言选择。*设计原则。
Version 5.2.20130619 +性能测试,+用例和测试
Version 5.2.20130620 +其它新需求,+功能测试(黄进),*性能自试(殷舒),*设计方案,+实现方案(比较),+发布地址
Version 5.3.20130621 *其它新需求(TODO)
Version 5.3.20130624 *相关约定(+SHARDING_FIRST),
Version 5.3.20130626 +例9单列分片普通SQL全局优先查询,+实现:新旧读写分离方式对比,*背景问题,*背景需求,
Version 5.3.20130626 *需求满足度(改为:需求和示例),个别细节修改(分片类型,新旧方式的对比)
Version 5.4.20130626 评审。隐藏部分内容(高可用数据存储架构,DAL技术选择,DAL服务平台,DAL地位,现状多表分离和多表查)
Version 5.5.20130628 发布。*新需求(陈瑛琦)支持DB2导出的归档数据文件;*相关约定:分隔符逗号改分号?!
Version 5.5.20130709 +新需求-结果合并
Version 5.6.20130802 +例10-x:多表分页
Version 5.6.20130814 *其他:框架拆分
Version 5.7.20131016 +例10-0:多表分页查询的参数
Version 5.7.20131016 +小节:新需求(TODO-1016)
Version 5.8.20140312 +数据分片,DAL实现,已有DAL方案
--------
分享:http://www.slideshare.net/nikeliu/20130626dal51-h
标题:联动优势数据访问层DAL架构和实践之五:访问分片数据
Architecture and Practice for DAL (5) Data Sharding
Architecture and Practice for Data Access Layer (5) Data Sharding
联动优势数据访问层DAL架构和实践之五:分片数据分片
说明:
How to implement a dalet to access sharding databases.
按照许超前的说法(见http://www.slideshare.net/xcq/ss-3629618),其实现的DAL与memcache比较,其性能差异主要在协议解析和查询分析上。
和已有DAL软件(如许超前DAL手机之家、陈思儒Amoeba/贺贤懋Cobar等)不一样,在前端访问方式的选择上,抛弃JDBC方式,而是为同一个dalet数据服务,同时提供自定义TCP长连接和HTTP长连接两种接口。
因而通过抛弃JDBC可以获得多方面的好处——
1)可减少S端协议解析和查询分析的开销;
2)也简化C端编程。
3)后端存储就不再限于RDB了,而可以是任意NOSQL、文件、缓存、甚至是Tuxedo等在线服务。
4)可以实现无状态了,更容易横向扩展。
5)从接口上就可消除join等关键字的误用,避免引起服务端负担过重。
--------
#19 GET.user_cross = select other from user1, user2;
GET.user_union = select other from user1 union all select other from user2;
# GET http://localhost:8081/dal/psqlid2/testdb/user_cross.txt
[{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭飞"},{"other":"赵军"},{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭
飞"},{"other":"赵军"},{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭飞"},{"other":"赵军"},{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭飞"},{"other":"赵军"}]
# GET http://localhost:8081/dal/psqlid2/testdb/user_union.txt
[{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭飞"},{"other":"赵军"},{"other":"刘胜"},{"other":"吴锋海"},{"other":"彭飞"},{"other":"赵军"}]
#24 JSR-223: Scripting for the Java Platform
JSR-241: Groovy – A New Standard Programming Language for the Java Platform
JSR-274: Standardizing BeanShell
JSR 292: Supporting Dynamically Typed Languages on the Java Platform
JSR-331: Java Constraint Programming API
https://code.google.com/p/red5/source/browse/java/scripting/branches/paulg_0.6/doc/engines.txt?spec=svn1276&r=1276
This is JSR-223 script engine for the Groovy language. Groovy is available for download at http://groovy.codehaus.org/. We have built and tested Groovy version 1.0 JSR-06.
This is JSR-223 script engine for JRuby - Java implementation of Ruby language. JRuby is available for download at http://jruby.sourceforge.net/. We have built and tested with JRuby version 0.9.0.
This is JSR-223 script engine for the Jacl language. Jacl is Java implementation of Tcl (Tool Command Language). This is available for download at http://tcljava.sourceforge.net/. We have built and tested with Jacl version 1.3.3.
This is JSR-223 script engine for the JudoScript language. JudoScript is available for download at http://www.judoscript.com/. We have built and tested with Judo version 0.9.
This is JSR-223 script engine for Jython - Java implementation of Python. Jython is available for download at http://www.jython.org/. We have built and tested with Jython version 2.1.
This is JSR-223 script engine for OGNL - Object Graph Navigation Language. OGNL is available for download at http://www.ognl.org/. We have built and tested with OGNL 2.6.9.
This is JSR-223 script engine for the Rhino / Javascript / ECMA language. Rhino is available for download at http://rhino.mozilla.org/. We have built and tested Groovy version 1.6 R2.
#30 GET.users_GetById3= select username,other from {SHARDING_TABLES} where id='{SHARDING_KEY1}'
GET.users_GetById3.js= function getTables(id) { if (id<100) return "testdb:user"; else return "testdb:user2"; }
#32 GET.user_GetById3x= select username,create_at from {SHARDING_TABLES} where username like '%{name}%'
GET.user_GetById3x.js= function getTables(dt6) { if (dt6>"201300") return "testdb:user2"; else return "testdb:user1"; }
#######################################################
http://localhost:8081/dal/shard2/user_GetById3x.txt?SHARDING_KEY1=201001&name=liu
http://localhost:8081/dal/shard2/user_GetById3x.txt?SHARDING_KEY1=201301&name=liu
#36 PUT.trans7_PutNoId3 = insert into {SHARDING_TABLES} (`id`, `dtime`, `content`) VALUES ('{id}', '{dtime}', '{content}')
PUT.trans7_PutNoId3.js= function getTables() { var day = new Date().getDay(); return "testdb:trans_"+day; }
#######################################################
# http://localhost:8081/dal/shard2/trans7_GetNoId3.txt
# http://localhost:8081/dal/shard2/trans7_PutNoId3.txt&id=99&dtime=20130606000099&content=NewTrans99
#38 CPU:Intel(R) Xeon(R) 1.86GHz *4
MEM:4G
##############################################################################
Dalet4Sharding
Server Software:
Server Hostname: 10.10.38.135
Server Port: 8082
Document Path: /dal/shard2/users_GetById16.txt?SHARDING_KEY1=1
Document Length: 132 bytes
Concurrency Level: 100
Time taken for tests: 9.997571 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 10000
Total transferred: 2740548 bytes
HTML transferred: 1320264 bytes
Requests per second: 1000.24 [#/sec] (mean)
Time per request: 99.976 [ms] (mean)
Time per request: 1.000 [ms] (mean, across all concurrent requests)
Transfer rate: 267.67 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 3
Processing: 2 98 65.5 82 723
Waiting: 2 98 65.5 82 723
Total: 2 98 65.6 82 723
Percentage of the requests served within a certain time (ms)
50% 82
66% 97
75% 112
80% 124
90% 161
95% 203
98% 299
99% 398
100% 723 (longest request)
##############################################################################
##############################################################################
Dalet4PSqlid
Server Software:
Server Hostname: 10.10.38.135
Server Port: 8082
Document Path: /dal/psqlid/testdb-db2/sql_select.txt?tables=sharding.t_test
Document Length: 101 bytes
Concurrency Level: 100
Time taken for tests: 10.467854 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 10000
Total transferred: 2430729 bytes
HTML transferred: 1010303 bytes
Requests per second: 955.31 [#/sec] (mean)
Time per request: 104.679 [ms] (mean)
Time per request: 1.047 [ms] (mean, across all concurrent requests)
Transfer rate: 226.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 3
Processing: 2 103 60.5 87 593
Waiting: 2 103 60.5 87 593
Total: 2 103 60.6 87 596
Percentage of the requests served within a certain time (ms)
50% 87
66% 105
75% 119
80% 131
90% 171
95% 213
98% 296
99% 359
100% 596 (longest request)
##############################################################################
#43 select id from {SHARDING_TABLES} where 1=1 limit {LIMIT_OFFSET},{LIMIT_ROWS}
SELECT * FROM (Select id,rownumber() over(ORDER BY id ASC) AS rn from {SHARDING_TABLES}) AS a WHERE a.rn >{LIMIT_OFFSET} AND a.rn<={LIMIT_OFFSET2}
JS函数返回格式 db1:s.tbl11,s.tbl12;db2:s2.tbl21,s2.tbl22;…
db1:s.tbl11,s.tbl12;db2:s2.tbl21,s2.tbl22;…
#49 缺点:在预编译/存储过程/分片模式下,对sql有特定的格式约束。
建议彻底杜绝跨库的Join操作。(优化join的方法就是不要join )
只要是多表关联查询本质上都是join操作。(可以支持,假设分片字段一致,分片规则一致。)
SELECT * FROM A,B WHERE A.ID = B.ID;
SELECT * FROM A JOIN B ON A.ID = B.ID;
SELECT * FROM A JOIN B USING(ID);
查询两表数据,这三条sql有什么不同——用MySQL的EXPLAIN 检测发现性能没有区别,只是写法不同