CUBRID Features Optimized for Social Networking Services

  • 1,401 views
Uploaded on

CUBRID has many optimizations for SNS. In this presentation CUBRID architect explains the characteristics of Social Networking Services and how CUBRID architecture is designed to meet these demands.

CUBRID has many optimizations for SNS. In this presentation CUBRID architect explains the characteristics of Social Networking Services and how CUBRID architecture is designed to meet these demands.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,401
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
38
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CUBRID Reference Architecture for Social Networking Service
    Kieun Park
    NHN Business Platform Corp.
    2011.8
  • 2. 46 CUBRID Reference Architecture for Social Networking Service
    2 /
  • 3. Abstract
    46 CUBRID Reference Architecture for Social Networking Service
    3 /
    The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts heavily impact on database they have. Thus, best practice in database architecture is important.
    Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs have a large number of followers so that the social graph is very huge. These properties of the data lead to new challenges, and demands new database architecture to handle them.
    The main considerations of database architecture for OSN are about scale-out and performance in addition to high availability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the service. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues.
    In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architectures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea. Described are the helpful features to support the database architecture demands for OSN service. For example, index scan with top-k sorting technique is developed for fast feed aggregation. Also, HA, automatic sharding and clustering features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
  • 4. I Am
    46 CUBRID Reference Architecture for Social Networking Service
    4 /
    박기은Kieun Park
    • Software/Database Architect
    • 5. Service Platform Development Center
    • 6. NHN Business Platform Corp.
    • 7. iamyaw@nhn.com
    • 8. CUBRID Open Source DBMS
    • 9. nStore Distributed Database System
  • Contents
    46 CUBRID Reference Architecture for Social Networking Service
    5 /
    Characteristics of online social networking service
    Challenges and demands on database architecture
    CUBRID features
    CUBRID reference architecture for social networking service
    How fast is the data growing in online social networking service?
    Characteristics of OSN service: Power-law scaling growth, data feeding frenzy, and Zipfian distribution access
    How does it access database? Feed aggregation
  • 10. Contents
    46 CUBRID Reference Architecture for Social Networking Service
    6 /
    Characteristics of online social networking service
    Challenges and demands on database architecture
    CUBRID features
    CUBRID reference architecture for social networking service
    Business demands and system requirements
    Main considerations of database architecture for OSN service
    Scale-out, performance, and high availability
  • 11. Contents
    46 CUBRID Reference Architecture for Social Networking Service
    7 /
    Characteristics of online social networking service
    Challenges and demands on database architecture
    CUBRID unique features
    CUBRID reference architecture for social networking service
    Index scan with top-k sorting technique
    High availability feature
    Automatic sharding component
    CUBRID Cluster System
    nStore, a distributed database system based on the CUBRID
  • 12. Contents
    46 CUBRID Reference Architecture for Social Networking Service
    8 /
    Characteristics of online social networking service
    Challenges and demands on database architecture
    CUBRID features
    CUBRID reference architecture for social networking service
    CUBRID Web Reference Architecture
    CUBRID SNS Reference Architecture
  • 13. 46 CUBRID Reference Architecture for Social Networking Service
    9 /
    Characteristics of online social networking service
  • 14. Some Infographics about Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    10 /
    The history and evolution of OSN are made in last 10 years.
    Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
  • 15. Some Infographics about Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    11 /
    500 million Facebook users, 106 million Twitter users
    Social networks with user bases larger than the population of most countries
    Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
  • 16. Some Infographics about Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    12 /
    The top ranked twitter user, Lady Gaga, has 11 million followers. About 55 million Tweets per day.
    Twitter gets about 600 million queries every day.
    (http://twitaholic.com)
    Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
  • 17. Some Infographics about Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    13 /
    The most followed person, Eminem, has more than 44 million fans.
    More than 5 billion pieces of content shared each week.
    2,716,000 messages, 1,587,000 wall posts, 10,208,000 comments in 20 minutes on Facebook.
    (http://www.independent.co.uk)
    Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/
    Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
  • 18. Some Infographics about Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    14 /
    Have we reached a world of infinite information?
    In a similar manner to our universe, the Internet is expanding at an incredibly rapid pace, reaching new levels of information storage and content creation every second.
    By 2020,
    roughly 25x1018 (quintillion)
    information containers
    Every minute,
    24 hours of video
    The growth gap
    between
    the digital contents created
    and the available storage
    Sourcehttp://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
  • 19. Statistics of Facebook and Twitter
    46 CUBRID Reference Architecture for Social Networking Service
    15 /
    140 million; the average number of Tweets people sent per day.
    6,939;current TPSrecord.
    More than 750 million active users.
    There are over 900 million objects that people interact with (pages, groups, events and community pages)
    Source http://www.facebook.com/press/info.php?statistics
    Source http://blog.twitter.com/2011/03/numbers.html
  • 20. Statistics of Me2Day
    46 CUBRID Reference Architecture for Social Networking Service
    16 /
    Postings per day: 278,461
    Total postings: 123,456,727
    Total photos: 10,638,089
  • 21. Online social networking service
    46 CUBRID Reference Architecture for Social Networking Service
    17 /
    Social data is an enormous graph of small objects that are tightly interconnected.
    The service page of OSN is a aggregation of events connected by social graph which is changing constantly with users' realtimeinteraction.
  • 22. Feed Following Works
    46 CUBRID Reference Architecture for Social Networking Service
    18 /
    Feeds Following
    Contents
    (comment, photo, tag, …)
    Follower
    News Feeds
    (personalized feeds)
    Application Layer
    Outbox
    Inbox
    Delivery & Aggregation
    Engine
    Content Management Layer
    Cache
    Database
    Database
    Data Storage Layer
  • 23. Characteristics of Online Social Networking Service
    46 CUBRID Reference Architecture for Social Networking Service
    19 /
    Power-law scalinggrowth
    • Users follow activity and news of other users and entities.
    • 24. Followers gets personalized feeds that aggregate streams produced those followed.
    • 25. Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.
    Online social networks have properties of significant clustering, small diameter, and power-law degrees.
    Zipfiandistribution access
    Data feeding frenzy
    Twitter Activity
    5% of users account for 75% of all activity, 10% account for 86% of activity, and the top 30% account for 97.4%.
  • 26. 46 CUBRID Reference Architecture for Social Networking Service
    20 /
    Challenges and demands on database architecture
  • 27. Challenge and Demands on Database Architecture
    46 CUBRID Reference Architecture for Social Networking Service
    21 /
    From business demands to technology implementation.
    • Online social networking service have rapidly proliferated and changed the way data is stored and served.
    • 28. Today social media generates more information in a short period of time than was previously available in the entire world a few generations ago.
    • 29. Not only the exponential growth of Facebook, Google+, Twitter, but also the use of more and more rich media such as user-generated video from smart phone, is surely driving big data.
    Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
  • 30. Social media now produces massive amounts of data. Facebook’s network, for instance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes]
    With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage.
    When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens.
    Challenge and Demands on Database Architecture
    46 CUBRID Reference Architecture for Social Networking Service
    22 /
    Managing user generated socialinteraction data!
    Coping with explosion in data volume!
    Cost-effective scale-out to meet rapidly growing demands!
  • 31. 46 CUBRID Reference Architecture for Social Networking Service
    23 /
    CUBRID unique features
  • 32. CUBRID
    46 CUBRID Reference Architecture for Social Networking Service
    24 /
    Free
    open source
    is the choice
    of the modern
    world
    Powerful
    clean architecture
    with rich functionality
    for competitive
    performance
    Enterprise
    unique features
    for stability
    and reliability
  • 33.
    • HA feature
    • 34. Reclaim deleted space
    • 35. Fast serial data (cached)
    • 36. LFS (large file support ) for database volume
    CUBRID
    46 CUBRID Reference Architecture for Social Networking Service
    25 /
    CUBRID 4.0 stable released.
    July, 2011
    CUBRID 3.0 stable released.
    October, 2010
    Official open source community, www.cubrid.org, opened.
    • INSERT performance enhancement
    • 37. Database volume size reduced.
    • 38. Multi-range scan and key limit function
    • 39. Covered index
    October, 2009
    CUBRID Cluster Project has been started.
    September, 2009
    CUBRID 2008 R2.0 stable released.
    August, 2009
    • FBO (file-based object)
    • 40. HA monitoring
    • 41. Full SQL function support
    CUBRID became an open source project.
    CUBRID 2008 R1.1 stable was released.
    November, 2008
    First internal release CUBRID 2008 R1.0
    October, 2008
    The development of CUBRID DBMS started.
    2011
    2006 
    2007 
    2008 
    2009 
    2010 
    2012
  • 42. CUBRID Index Scan with Top-k Sorting Technique
    46 CUBRID Reference Architecture for Social Networking Service
    26 /
    CUBRID does multi-range index scan.
    My friends’ newest twenty comments
    SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000
    ORDER BY registered_date DESC LIMIT 20
    Multi-range scan
    Single range scan with key filter
    Disk I/O ?!
    # of leaf pages accessed
    > # of keys of scan result
    # of leaf pages accessed
    = # of keys of scan result
    Filter out
    On the fly sorting
    during scan
    Sort after scan
    (4,10001) (4,9999) (4,875) …
    (4,10001) (4,9999) (4,875) …
    (36,947) (36,120) (36,3) …
    (36,947) (36,120) (36,3) …
    (15, 10000) (15,9999) (15, 7467) …
    (15, 10000) (15,9999) (15, 7467) …
  • 43. CUBRID Index Scan with Top-k Sorting Technique
    46 CUBRID Reference Architecture for Social Networking Service
    27 /
    SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’
    ORDER BY b LIMIT 3;
    SELECT * FROM tbl WHERE a = 2 AND b < ‘K’
    ORDER BY b LIMIT 3;
  • 44. CUBRID Test Results
    46 CUBRID Reference Architecture for Social Networking Service
    28 /
    Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test
    Test case 1: user group 1 only
    Test case 2: user group 2 only
    Test case 3: 40% of user group 1, 50% of user group 2, 10% of user group 3
    Test case 4: 10% of user group 1, 50% of user group 2, 40% of user group 3
    User group 1: users with 50 or less friends
    User group 2: users with 51~2000 friends
    User group 3: users with friends up to tens of thousands
  • 45. CUBRID High Availability Feature
    46 CUBRID Reference Architecture for Social Networking Service
    29 /
    CUBRID HA, highly fault-resistant DBMS enables
    • Non-stop 24x7 service
    • 46. System maintenance without shutdown
    • 47. Automatically fail-over (less than 20 sec)
    • 48. Various acess modes (read-write, read-only)
    Application
    CUBRID Driver
    CUBRID Driver
    UPDATE
    SELECT
    UPDATE
    Broker
    Active
    Broker
    Backup
    Broker
    automatic
    switch-over
    Read-Only
    Mode
    Read-Write
    Mode
    Standby-2
    Server
    @Remote IDC
    Standby-1
    Server
    automatic
    fail-over/fail-back
    Active
    Server
    Database Server
    Slave DB
    Master DB
    Slave DB
  • 49. CUBRID High Availability Feature
    46 CUBRID Reference Architecture for Social Networking Service
    30 /
    UPDATE
    SELECT
    Heartbeat
    Heartbeat
    Log Applying
    Log Applying
    Log Shipping
    (synchronous)
    Log
    Writer
    Log
    Applier
    CUBRID
    Server
    Log
    Writer
    Log
    Applier
    CUBRID
    Server
    Slave
    DB
    Replication
    Log
    Replication
    Log
    Transaction
    Log
    Transaction
    Log
    Master
    DB
    S1-Node
    Standby Server Node
    A-Node
    Active Server Node
    Log Shipping
    (asynchronous)
    Heartbeat
    SELECT
    Log Applying
    HA feature is based on database replication with transaction log multiplication technique.
    Slave
    DB
    Replication
    Log
    Transaction
    Log
    Statement-based replication could cause data inconsistency.
    S2-Node
  • 50. CUBRID Automatic Sharding Component
    46 CUBRID Reference Architecture for Social Networking Service
    31 /
    Automatic shardingfeature enables
    • No more application logic
    • 51. Scale-out DB architecture
    Features
    • Multiple sharding strategies
    Shard by modulus, date/time range, extendible hash
    • User hint-aware
    SELECT * FROM tbl WHERE nonkey=‘abc’ /* shard=1 */
    Application
    SELECT … WHERE key=k0008
    UPDATE … WHERE key=k0002
    Broker
    Sharding
    Metadata
    automatic sharding
    Expand Shard
    k0001
    k0005
    K000…
    k0002
    k0006
    K000…
    k0003
    k0007
    K000…
    k0004
    k0008
    K000…
    Shard #1
    Shard #2
    Shard #3
    Shard #4
    Database Server
    New Shard
  • 52. CUBRID Cluster System
    46 CUBRID Reference Architecture for Social Networking Service
    32 /
    Main features of CUBRID Cluster are
    • Global schema
    • 53. Distributed partition
    • 54. Load balancing
    Users can get
    • Single big database view
    • 55. Location transparency
    • 56. Additionally, linear scalability
    Application
    SELECT * FROM gtable
    WHERE part_key=2 AND …
    INSERT INTO gtable …
    Broker
    load balancing
    global schema / distributed partition
    gtable
    part_01
    part_05
    gtable
    part_02
    part_06
    gtable
    part_03
    part_07
    gtable
    part_04
    part_08
    Node #1
    Node #2
    Node #3
    Node #4
    Cluster Server
  • 57. CUBRID Cluster System
    46 CUBRID Reference Architecture for Social Networking Service
    33 /
    The global schemais a single representation or a global view of all nodes where each node has its own database and schema.
    SELECT * FROM contents
    WHERE auth = (SELECT name FROM author WHERE …)
    Local
    Schema
    User
    Global
    Schema
    User
    UPDATE local …
    SELECT * FROM contents
    WHERE …
    SELECT * FROM info, code
    WHERE info.id = code.id
    INSERT INTO contents…
    info
    contents
    author
    Global Schema
    author
    code
    level
    local
    contents
    contents
    contents
    info
    Local Schema #4
    Local Schema #3
    Local Schema #2
    Local Schema #1
    The users can access any databases through a single schema regardless of and without knowing the location of the distributed data.
    Database #1
    Database #2
    Database #3
    Database #4
  • 58. CUBRID Cluster
    46 CUBRID Reference Architecture for Social Networking Service
    34 /
    Global Schema
    Data
    System
    Catalog
    Logical View
    Logical View
    Index
    Physical View
    Physical View
    Schema
    Schema
    Data
    System
    Catalog
    System
    Catalog
    Data
    Index
    Index
  • 59. CUBRID Cluster
    46 CUBRID Reference Architecture for Social Networking Service
    35 /
    The distributed partition maps global schema onto table partitioning.
    Partitions are resident in different nodes but accessed through global schema.
    SELECT * FROM gtable, info
    WHERE gtable.part_key=02
    AND info.id = gtable.id
    gtable – PARTITION BY HASH (part_key)
    info
    part_01
    part_02
    part_03
    part_04
    Global Schema
    part_05
    part_06
    part_07
    part_08
    Partition Data
    Partition Data
    Partition Data
    Partition Data
    part_02
    part_03
    part_03
    part_01
    info
    part_06
    part_07
    part_08
    part_05
    Database #1
    Database #2
    Database #3
    Database #4
  • 60. nStore, a distributed database system based on the CUBRID
    46 CUBRID Reference Architecture for Social Networking Service
    36 /
    RDB-like tabular model
    • Schema, column, record
    • 61. Index on columns (ordered search)
    Restricted data type
    • Integer(bigint), string, timestamp(msec), id(128bit), bool
    Data partitioned by key
    • E.g., user-id could be a key
    SQL-like query language
    • SELECT a, b, c FROM postWHERE fid IN (?, ?, ?) AND b=?ORDER BY ts LIMIT 20,CK=“iamyaw”
    • 62. INSERT INTO post(no, id, date) VALUES (?, ?, ?),CK=“iamyaw”
    Joinsupported
    • Between tables in one container
  • nStore, a distributed database system based on the CUBRID
    46 CUBRID Reference Architecture for Social Networking Service
    37 /
    Application
    Application
    Application
    REST API
    http://server/keyspace/query?ckey=iamyaw&nsql=‘select a from tbl where k=100’&format=json
    Data Distribution
    Replication (3- Copy)
    Rebalancing
    nStore
    nStore
    CUBRID
    CUBRID
    nStore
    nStore
    nStore
    Query Processing
    Storage System
    CUBRID
    CUBRID
    CUBRID
  • 63. nStore, a distributed database system based on the CUBRID
    46 CUBRID Reference Architecture for Social Networking Service
    38 /
    Application
    Container Server
    Container (ckey=iamyaw)
    nStore
    Equi-join
    REST API
    Table A
    Table B
    Container Server
    Table C
    Indexed
    Column
    Equi-join
    Container Server
    Container Server
    Global
    Table G
    Management Node
    Indexed Column
    Container (ckey=kieun_park)
    Equi-join
    Container Server
    Table A
    Table B
    Tables
    Table C
    Indexed
    Column
    Distribution layer
    RDBMS
    Indexed Column
  • 64. nStore Test Results
    46 CUBRID Reference Architecture for Social Networking Service
    39 /
    Tested using YCSB (http://research.yahoo.com/Web_Information_Management/YCSB)
    INSERT: 50,000,000 records (1K size)
    READ: Zifian distribution
    READ w/ compaction: after SSTable compaction (Cassandra, Hbase)
    READ/UPDATE: 50:50 (50,000,000 records DB)
    READ/INSERT: 50:50 (50,000,000 records DB)
  • 65. 46 CUBRID Reference Architecture for Social Networking Service
    40 /
    CUBRID referencearchitecture for social networking service
  • 66. CUBRID Web Reference Architecture
    46 CUBRID Reference Architecture for Social Networking Service
    41 /
    Mid-size
    web service
    Web Server
    (User Interface)
    Small-size
    web service
    Web Application Server (Business Logic)
    Cache Server
    Web Server
    RW
    RO
    DB Sharding
    master
    master
    master
    master
    CUNITOR
    master
    slave
    slave
    slave
    slave
    slave
    CUBRID HA
    CUBRID HA
  • 67. Social Networking Service Architecture
    46 CUBRID Reference Architecture for Social Networking Service
    42 /
    Web Servers (User Interface)
    Cache Layer
    Web Application Servers (Business Logic)
    Social Query Engine
    Aggregation Engine
    Delivery Engine
    Search Engine
    Recommendation
    Engine
    User Profile DB
    Social Relation DB
    Analytics DB
    Feed Outbox DB
    Feed Inbox DB
    Search Index
  • 68. CUBRID SNS Reference Architecture
    46 CUBRID Reference Architecture for Social Networking Service
    43 /
    Analytic DB
    partitioned for OLAP
    Application servers
    ETL
    Cache server farm
    node #2
    node #n
    node #1
    CUBRID Cluster
    User profile DB
    sharded by user-id
    Social relation DB
    sharded by user-id
    Inbox/Outbox storage
    distributed according to user-id
    OAM
    RW
    RO
    RW
    RO
    broker
    broker
    DB Sharding
    container
    container
    DB Sharding
    container
    container
    management
    slave
    slave
    slave
    slave
    monitoring
    server
    container
    container
    nStore w/ CUBRID
    CUNITOR
    master
    master
    master
    master
    CUBRID HA
    CUBRID HA
  • 69. Best Practices
    46 CUBRID Reference Architecture for Social Networking Service
    44 /
    High available database architecture is the basic business requirements and not technical barrier anymore.
    Automatic shardingis an effective way to scale-out DB system storing relational model data.
    nStore is a solution for peta-byte scale data with benefits of high available and scalable distributed store.