SlideShare a Scribd company logo
1 of 74
Download to read offline
Social Networks and
                          the Richness of Data
                             Getting distributed Webservices
                                   Done with NoSQL

                                 Fabrizio Schmidt, Lars George
                                    VZnet Netzwerke Ltd.




Mittwoch, 10. März 2010
Content

                     • Unique Challenges
                     • System Evolution
                     • Architecture
                     • Activity Stream - NoSQL
                     • Lessons learned, Future
Mittwoch, 10. März 2010
Unique Challenges

                                    • 16 Million Users
                                    • > 80% Active/Month
                                    • > 40% Active/Daily
                                    • > 30min Daily Time
                                      on Site



Mittwoch, 10. März 2010
Mittwoch, 10. März 2010
Unique Challenges
                •         16 Million Users

                •         1 Billion Relationships

                •         3 Billion Photos

                •         150 TB Data

                                                    •   13 Million Messages per Day

                                                    •   17 Million Logins per Day

                                                    •   15 Billion Requests per Month

                                                    •   120 Million Emails per Week
Mittwoch, 10. März 2010
Old System - Phoenix

                     • LAMP
                        • Apache + PHP + APC (50 req/s)
                        • Sharded MySQL Multi-Master Setup
                        • Memcache with 1 TB+
                          Monolithic Single Service, Synchronous

Mittwoch, 10. März 2010
Old System - Phoenix

                     • 500+ Apache Frontends
                     • 60+ Memcaches
                     • 150+ MySQL Servers


Mittwoch, 10. März 2010
Old System -
      Phoenix




Mittwoch, 10. März 2010
DON‘T PANIC




Mittwoch, 10. März 2010
Asynchronous Services
                     • Basic Services
                          •   Twitter

                          •   Mobile

                          •   CDN Purge

                          •   ...

                     • Java (e.g. Tomcat)
                     • RabbitMQ
Mittwoch, 10. März 2010
First
          Services




Mittwoch, 10. März 2010
Phoenix - RabbitMQ
                1. PHP Implementation of AMQP Client
                          Too slow!
                2. PHP C - Extension (php-amqp http://code.google.com/p/php-amqp/)
                          Fast enough
                3. IPC - AMQP Dispatcher C-Daemon
                          That‘s it! But not released so far



Mittwoch, 10. März 2010
IPC - AMQP
              Dispatcher




Mittwoch, 10. März 2010
Activity Stream



Mittwoch, 10. März 2010
Old Activity Stream


                •         Memcache only - no persistence

                •         Status updates only

                •         #fail on users with >1000 friends

                •         #fail on memcache restart



Mittwoch, 10. März 2010
Old Activity Stream
                                            We cheated!



                •         Memcache only - no persistence

                •         Status updates only

                •         #fail on users with >1000 friends

                •         #fail on memcache restart



Mittwoch, 10. März 2010
Old Activity Stream
                                            We cheated!



                •         Memcache only - no persistence

                •         Status updates only

                •         #fail on users with >1000 friends

                •         #fail on memcache restart



                                                              source: internet

Mittwoch, 10. März 2010
Social Network Problem
                                  = Twitter Problem???

                     • >15 different Events
                     • Timelines
                     • Aggregation
                     • Filters
                     • Privacy
Mittwoch, 10. März 2010
Do the Math!




Mittwoch, 10. März 2010
Do the Math!

                          18M Events/day sent to ~150 friends




Mittwoch, 10. März 2010
Do the Math!

                          18M Events/day sent to ~150 friends
                            => 2700M timeline inserts / day




Mittwoch, 10. März 2010
Do the Math!

                          18M Events/day sent to ~150 friends
                            => 2700M timeline inserts / day
                          20% during peak hour




Mittwoch, 10. März 2010
Do the Math!

                          18M Events/day sent to ~150 friends
                            => 2700M timeline inserts / day
                          20% during peak hour
                            => 3.6M event inserts/hour - 1000/s




Mittwoch, 10. März 2010
Do the Math!

                          18M Events/day sent to ~150 friends
                            => 2700M timeline inserts / day
                          20% during peak hour
                            => 3.6M event inserts/hour - 1000/s
                            => 540M timeline inserts/hour - 150000/s




Mittwoch, 10. März 2010
e inserts / day
r
erts/hour - 1000/s
 inserts/hour - 150000/s



Mittwoch, 10. März 2010
New Activity Stream


                     • Social Network Problem
                     • Architecture
                     • NoSQL Systems

Mittwoch, 10. März 2010
New Activity Stream
                                   Do it right!



                     • Social Network Problem
                     • Architecture
                     • NoSQL Systems

Mittwoch, 10. März 2010
New Activity Stream
                                   Do it right!



                     • Social Network Problem
                     • Architecture
                     • NoSQL Systems

                                                  source: internet



Mittwoch, 10. März 2010
Architecture



Mittwoch, 10. März 2010
FAS
                            Federated Autonomous Services




                     • Nginx + Janitor
                     • Embedded Jetty + RESTeasy
                     • NoSQL Storage Backends


Mittwoch, 10. März 2010
FAS
     Federated Autonomous
            Services




Mittwoch, 10. März 2010
Activity Stream
                                      as a service


                     Requirements:
                     • Endless scalability
                     • Storage & cloud independent
                     • Fast
                     • Flexible & extensible data model

Mittwoch, 10. März 2010
Thinking in
                layers...




Mittwoch, 10. März 2010
Activity
     Stream
         as a service




Mittwoch, 10. März 2010
Activity
     Stream
         as a service




Mittwoch, 10. März 2010
NoSQL
    Schema




Mittwoch, 10. März 2010
NoSQL
    Schema
                          Event   Event is sent in by
                                  piggybacking the request




Mittwoch, 10. März 2010
NoSQL
    Schema
                          Generate ID
                            Event       Generate itemID - unique ID
                                        of the event




Mittwoch, 10. März 2010
NoSQL
    Schema
                           Save Item
                          Generate ID
                             Event      itemID => stream_entry - save
                                        the event with meta information




Mittwoch, 10. März 2010
NoSQL
                                    Insert into the timeline of each
    Schema                          recipient
                                    recipient → [[itemId, time, type],
                                    …]
                     Save Item
                   Update Indexes
                    Generate ID
                       Event
                                    Insert into the timeline of the
                                    event originator
                                    sender → [[itemId, time, type],
                                    …]




Mittwoch, 10. März 2010
NoSQL
    Schema
                           Save Item
                          Generate ID
                             Event




Mittwoch, 10. März 2010
MRI

                          (Redis)




Mittwoch, 10. März 2010
MRI

                          (Redis)




Mittwoch, 10. März 2010
Architecture: Push
                            Message Recipient Index (MRI)



                Push the Message directly to all MRIs

                ➡ {number of Recipients ~150} updates
                Special profiles and some users have >500 recipients

                ➡ >500 pushes to recipient timelines => stress the system!



Mittwoch, 10. März 2010
ORI

                          (Voldemort/
                            Redis)




Mittwoch, 10. März 2010
ORI

                          (Voldemort/
                            Redis)




Mittwoch, 10. März 2010
Architecture: Pull
                               Originator Index (ORI)



                NO Push to MRIs at all

                ➡ 1 Message + 1 Originator Index Entry
                Special profiles and some users have >500 friends

                ➡ get >500 ORIs on read => stress the system



Mittwoch, 10. März 2010
Architecture: PushPull
                                 ORI + MRI




          • Identify Users with recipient lists >{limit}
          • Only push updates with recipients <{limit} to MRI
          • Pull special profiles and users with >{limit} from ORI
          • Identify active users with a bloom/bit filter for pull

Mittwoch, 10. März 2010
Lars
                                  Activity Filter
                     • Reduce read operations on storage
                     • Distinguish user activity levels
                     • In memory and shared across keys and
                          types
                     • Scan full day of updates for16M users on a
                          per minute granularity for 1000 friends in <
                          100msecs


Mittwoch, 10. März 2010
Activity
    Filter




Mittwoch, 10. März 2010
NoSQL




Mittwoch, 10. März 2010
NoSQL: Redis
                                      ORI + MRI on Steroids

                     •    Fast in memory Data-Structure Server

                     •    Easy protocol

                     •    Asynchronous Persistence

                     •    Master-Slave Replication

                     •    Virtual-Memory

                     •    JRedis - The Java client


Mittwoch, 10. März 2010
NoSQL: Redis
                                            ORI + MRI on Steroids

                     Data-Structure Server
                     •    Datatypes: String, List, Sets, ZSets

                     •    We use ZSets (sorted sets) for the Push Recipient Indexes

                          Insert

                                for (recipient : recipients) {
                                     jredis.zadd(recipient.id, streamEntryIndex);
                                 }

                          Get

                                jredis.zrange(streamOwnerId, from, to)
                                jredis.zrangebyscore(streamOwnerId, someScoreBegin, someScoreEnd)


Mittwoch, 10. März 2010
NoSQL: Redis
                                  ORI + MRI on Steroids

                 Persistence - AOF and Bgsave

                 AOF - append only file
                   - append on operation

                 Bgsave - asynchronous snapshot
                   - configurable (timeperiod or every n operations)
                   - triggered directly

                 We use AOF as itʻs less memory hungry
                 Combined with bgsave for additional backups




Mittwoch, 10. März 2010
NoSQL: Redis
                                   ORI + MRI on Steroids

                 Virtual - Memory

                     Storing Recipient Indexes for 16 mio users à ~500
                     entries would lead to >250 GB of RAM needed

                 With Virtual Memory activated Redis swaps less
                 frequented values to disk
                 ➡ Only your hot dataset is in memory
                 ➡ 40% logins per day / only 20% of these in peak
                     ~ 20GB needed for hot dataset


Mittwoch, 10. März 2010
NoSQL: Redis
                                ORI + MRI on Steroids

                Jredis - Redis java client

                • Pipelining support (sync and async semantics)
                • Redis 1.2.3 compliant

                The missing parts

                • No consistent hashing
                • No rebalancing


Mittwoch, 10. März 2010
Message
                            Store

                          (Voldemort)




Mittwoch, 10. März 2010
Message
                            Store

                          (Voldemort)




Mittwoch, 10. März 2010
NoSQL:Voldemort
                               No #fail Messagestore (MS)


                     • Key-Value Store
                     • Replication
                     • Versioning
                     • Eventual Consistency
                     • Pluggable Routing / Hashing Strategy
                     • Rebalancing
                     • Pluggable Storage-Engine
Mittwoch, 10. März 2010
NoSQL:Voldemort
                                 No #fail Messagestore (MS)

                Configuring replication, reads and writes
                <store>
                    <name>stream-ms</name>
                    <persistence>bdb</persistence>
                    <routing>client</routing>
                    <replication-factor>3</replication-factor>
                    <required-reads>2</required-reads>
                    <required-writes>2</required-writes>
                    <prefered-reads>3</prefered-reads>
                    <prefered-writes>3</prefered-writes>
                    <key-serializer><type>string</type></key-serializer>
                    <value-serializer><type>string</type></value-serializer>
                    <retention-days>8</retention-days>
                </store>



Mittwoch, 10. März 2010
NoSQL:Voldemort
                                        No #fail Messagestore (MS)

                Write:
                client.put(key, myVersionedValue);


                Update(read-modify-write):
                public class MriUpdateAction extends UpdateAction<String, String> {

                          public MriUpdateAction(String key, ItemIndex index) {
                              this.key = key;
                              this.index = index;
                          }

                          @Override
                          public void update(StoreClient<String, String> client) {
                              Versioned<String> versionedJson = client.get(this.key);
                              versionedJson.setObject("my value");
                              client.put(this.key, versionedJson);
                          }
                }

Mittwoch, 10. März 2010
NoSQL:Voldemort
                                            No #fail Messagestore (MS)
                Eventual Consistency - Read
                public MriInconsistencyResolver implements InconsistencyResolver<Versioned<String>> {

                	         public List<Versioned<String>> resolveConflicts(List<Versioned<String>> items){
                	             Versioned<String> vers0 = items.get(0);
                	             Versioned<String> vers1 = items.get(1);
                	             if(vers0 == null && vers1 == null) {
                	         	   	   return null;
                	         	   }

                	             List<Versioned<String>> li = new ArrayList<Versioned<String>>(1);
                	         	   if(vers0 == null) {
                	         	   	   li.add(vers1);
                	         	   	   return li;
                	         	   }
                	         	   if(vers1 == null) {
                	         	   	   li.add(vers0);
                	         	   	   return li;
                	         	   }
                	         	
                              // resolve your inconsistency here e.g. merge to lists 	   	
                          }
                }


                The default inconsistency resolver automatically takes the newer version
Mittwoch, 10. März 2010
NoSQL:Voldemort
                                  No #fail Messagestore (MS)



                     Configuration
                     •    Choose a big number of partitions

                     •    Reduce the size of the BDB append log

                     •    Balance Client and Server Threadpools




Mittwoch, 10. März 2010
Concurrent
                             ID
                          Generator




Mittwoch, 10. März 2010
Concurrent
                             ID
                          Generator




Mittwoch, 10. März 2010
NoSQL: Hazelcast
                               Concurrent ID Generator (CIG)

                     • In Memory Data Grid
                     • Dynamically Scales
                     • Distributed java.util.{Queue|Set|List|Map}
                          and more
                     • Dynamic Partitioning with Backups
                     • Configurable Eviction
                     • Persistence
Mittwoch, 10. März 2010
NoSQL: Hazelcast
                               Concurrent ID Generator (CIG)

                     Cluster-wide ID Generation:
                     • No UUID because of architecture constraint
                     • IDs of Stream Entries generated via
                          Hazelcast
                     • Replication to avoid loss of count
                     • Background persistence used for disaster
                          recovery

Mittwoch, 10. März 2010
NoSQL: Hazelcast
                                Concurrent ID Generator (CIG)

                    Generate Unique Sequencial Numbers (Distributed
                    Autoincrement):
                    •     Nodes get ranges assigned (node1: 10000-19999,
                          node2: 20000-29999  ID's)
                    •     IDs per range locally incremented on the
                          node (thread safe/atomic)
                    •     Distributed locks secure range assignment for
                          nodes


Mittwoch, 10. März 2010
NoSQL: Hazelcast
                                      Concurrent ID Generator (CIG)

                          Example Configuration

                          <map name=".vz.stream.CigHazelcast">

                              <map-store enabled="true">

                                   <class-name>net.vz.storage.CigPersister</class-name>
                                   <write-delay-seconds>0</write-delay-seconds>

                              </map-store>

                               <backup-count>3</backup-count>

                          </map>




Mittwoch, 10. März 2010
NoSQL: Hazelcast
                           Concurrent ID Generator (CIG)




                Future use-cases:
                • Advanced preaggregated cache
                • Distributed Executions


Mittwoch, 10. März 2010
Lessons Learned
                     •    Start benchmarking and profiling your app early!

                     •    A fast and easy Deployment keeps the motivation
                          up

                     •    Configure Voldemort carefully (especially on large
                          heap machines)

                     •    Read the mailing lists of the #nosql system you use

                     •    No Solution in docs? - read the sources!

                     •    At some point stop discussing and just do it!

Mittwoch, 10. März 2010
In Progress

                     • Network Stream
                     • Global Public Stream
                     • Stream per Location
                     • Hashtags

Mittwoch, 10. März 2010
Future


                     • Geo Location Stream
                     • Third Party API


Mittwoch, 10. März 2010
Questions?



Mittwoch, 10. März 2010

More Related Content

Viewers also liked

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Noteslarsgeorge
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Sciencelarsgeorge
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBaseSalesforce Developers
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013larsgeorge
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHortonworks
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
A Brief Intro to Scala
A Brief Intro to ScalaA Brief Intro to Scala
A Brief Intro to ScalaTim Underwood
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 

Viewers also liked (20)

HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
Big Data is not Rocket Science
Big Data is not Rocket ScienceBig Data is not Rocket Science
Big Data is not Rocket Science
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
 
Tech day hadoop, Spark
Tech day hadoop, SparkTech day hadoop, Spark
Tech day hadoop, Spark
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Soutenance ysance
Soutenance ysanceSoutenance ysance
Soutenance ysance
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
A Brief Intro to Scala
A Brief Intro to ScalaA Brief Intro to Scala
A Brief Intro to Scala
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 

Social Networks and the Richness of Data

  • 1. Social Networks and the Richness of Data Getting distributed Webservices Done with NoSQL Fabrizio Schmidt, Lars George VZnet Netzwerke Ltd. Mittwoch, 10. März 2010
  • 2. Content • Unique Challenges • System Evolution • Architecture • Activity Stream - NoSQL • Lessons learned, Future Mittwoch, 10. März 2010
  • 3. Unique Challenges • 16 Million Users • > 80% Active/Month • > 40% Active/Daily • > 30min Daily Time on Site Mittwoch, 10. März 2010
  • 5. Unique Challenges • 16 Million Users • 1 Billion Relationships • 3 Billion Photos • 150 TB Data • 13 Million Messages per Day • 17 Million Logins per Day • 15 Billion Requests per Month • 120 Million Emails per Week Mittwoch, 10. März 2010
  • 6. Old System - Phoenix • LAMP • Apache + PHP + APC (50 req/s) • Sharded MySQL Multi-Master Setup • Memcache with 1 TB+ Monolithic Single Service, Synchronous Mittwoch, 10. März 2010
  • 7. Old System - Phoenix • 500+ Apache Frontends • 60+ Memcaches • 150+ MySQL Servers Mittwoch, 10. März 2010
  • 8. Old System - Phoenix Mittwoch, 10. März 2010
  • 10. Asynchronous Services • Basic Services • Twitter • Mobile • CDN Purge • ... • Java (e.g. Tomcat) • RabbitMQ Mittwoch, 10. März 2010
  • 11. First Services Mittwoch, 10. März 2010
  • 12. Phoenix - RabbitMQ 1. PHP Implementation of AMQP Client Too slow! 2. PHP C - Extension (php-amqp http://code.google.com/p/php-amqp/) Fast enough 3. IPC - AMQP Dispatcher C-Daemon That‘s it! But not released so far Mittwoch, 10. März 2010
  • 13. IPC - AMQP Dispatcher Mittwoch, 10. März 2010
  • 15. Old Activity Stream • Memcache only - no persistence • Status updates only • #fail on users with >1000 friends • #fail on memcache restart Mittwoch, 10. März 2010
  • 16. Old Activity Stream We cheated! • Memcache only - no persistence • Status updates only • #fail on users with >1000 friends • #fail on memcache restart Mittwoch, 10. März 2010
  • 17. Old Activity Stream We cheated! • Memcache only - no persistence • Status updates only • #fail on users with >1000 friends • #fail on memcache restart source: internet Mittwoch, 10. März 2010
  • 18. Social Network Problem = Twitter Problem??? • >15 different Events • Timelines • Aggregation • Filters • Privacy Mittwoch, 10. März 2010
  • 19. Do the Math! Mittwoch, 10. März 2010
  • 20. Do the Math! 18M Events/day sent to ~150 friends Mittwoch, 10. März 2010
  • 21. Do the Math! 18M Events/day sent to ~150 friends => 2700M timeline inserts / day Mittwoch, 10. März 2010
  • 22. Do the Math! 18M Events/day sent to ~150 friends => 2700M timeline inserts / day 20% during peak hour Mittwoch, 10. März 2010
  • 23. Do the Math! 18M Events/day sent to ~150 friends => 2700M timeline inserts / day 20% during peak hour => 3.6M event inserts/hour - 1000/s Mittwoch, 10. März 2010
  • 24. Do the Math! 18M Events/day sent to ~150 friends => 2700M timeline inserts / day 20% during peak hour => 3.6M event inserts/hour - 1000/s => 540M timeline inserts/hour - 150000/s Mittwoch, 10. März 2010
  • 25. e inserts / day r erts/hour - 1000/s inserts/hour - 150000/s Mittwoch, 10. März 2010
  • 26. New Activity Stream • Social Network Problem • Architecture • NoSQL Systems Mittwoch, 10. März 2010
  • 27. New Activity Stream Do it right! • Social Network Problem • Architecture • NoSQL Systems Mittwoch, 10. März 2010
  • 28. New Activity Stream Do it right! • Social Network Problem • Architecture • NoSQL Systems source: internet Mittwoch, 10. März 2010
  • 30. FAS Federated Autonomous Services • Nginx + Janitor • Embedded Jetty + RESTeasy • NoSQL Storage Backends Mittwoch, 10. März 2010
  • 31. FAS Federated Autonomous Services Mittwoch, 10. März 2010
  • 32. Activity Stream as a service Requirements: • Endless scalability • Storage & cloud independent • Fast • Flexible & extensible data model Mittwoch, 10. März 2010
  • 33. Thinking in layers... Mittwoch, 10. März 2010
  • 34. Activity Stream as a service Mittwoch, 10. März 2010
  • 35. Activity Stream as a service Mittwoch, 10. März 2010
  • 36. NoSQL Schema Mittwoch, 10. März 2010
  • 37. NoSQL Schema Event Event is sent in by piggybacking the request Mittwoch, 10. März 2010
  • 38. NoSQL Schema Generate ID Event Generate itemID - unique ID of the event Mittwoch, 10. März 2010
  • 39. NoSQL Schema Save Item Generate ID Event itemID => stream_entry - save the event with meta information Mittwoch, 10. März 2010
  • 40. NoSQL Insert into the timeline of each Schema recipient recipient → [[itemId, time, type], …] Save Item Update Indexes Generate ID Event Insert into the timeline of the event originator sender → [[itemId, time, type], …] Mittwoch, 10. März 2010
  • 41. NoSQL Schema Save Item Generate ID Event Mittwoch, 10. März 2010
  • 42. MRI (Redis) Mittwoch, 10. März 2010
  • 43. MRI (Redis) Mittwoch, 10. März 2010
  • 44. Architecture: Push Message Recipient Index (MRI) Push the Message directly to all MRIs ➡ {number of Recipients ~150} updates Special profiles and some users have >500 recipients ➡ >500 pushes to recipient timelines => stress the system! Mittwoch, 10. März 2010
  • 45. ORI (Voldemort/ Redis) Mittwoch, 10. März 2010
  • 46. ORI (Voldemort/ Redis) Mittwoch, 10. März 2010
  • 47. Architecture: Pull Originator Index (ORI) NO Push to MRIs at all ➡ 1 Message + 1 Originator Index Entry Special profiles and some users have >500 friends ➡ get >500 ORIs on read => stress the system Mittwoch, 10. März 2010
  • 48. Architecture: PushPull ORI + MRI • Identify Users with recipient lists >{limit} • Only push updates with recipients <{limit} to MRI • Pull special profiles and users with >{limit} from ORI • Identify active users with a bloom/bit filter for pull Mittwoch, 10. März 2010
  • 49. Lars Activity Filter • Reduce read operations on storage • Distinguish user activity levels • In memory and shared across keys and types • Scan full day of updates for16M users on a per minute granularity for 1000 friends in < 100msecs Mittwoch, 10. März 2010
  • 50. Activity Filter Mittwoch, 10. März 2010
  • 52. NoSQL: Redis ORI + MRI on Steroids • Fast in memory Data-Structure Server • Easy protocol • Asynchronous Persistence • Master-Slave Replication • Virtual-Memory • JRedis - The Java client Mittwoch, 10. März 2010
  • 53. NoSQL: Redis ORI + MRI on Steroids Data-Structure Server • Datatypes: String, List, Sets, ZSets • We use ZSets (sorted sets) for the Push Recipient Indexes Insert for (recipient : recipients) { jredis.zadd(recipient.id, streamEntryIndex); } Get jredis.zrange(streamOwnerId, from, to) jredis.zrangebyscore(streamOwnerId, someScoreBegin, someScoreEnd) Mittwoch, 10. März 2010
  • 54. NoSQL: Redis ORI + MRI on Steroids Persistence - AOF and Bgsave AOF - append only file - append on operation Bgsave - asynchronous snapshot - configurable (timeperiod or every n operations) - triggered directly We use AOF as itʻs less memory hungry Combined with bgsave for additional backups Mittwoch, 10. März 2010
  • 55. NoSQL: Redis ORI + MRI on Steroids Virtual - Memory Storing Recipient Indexes for 16 mio users à ~500 entries would lead to >250 GB of RAM needed With Virtual Memory activated Redis swaps less frequented values to disk ➡ Only your hot dataset is in memory ➡ 40% logins per day / only 20% of these in peak ~ 20GB needed for hot dataset Mittwoch, 10. März 2010
  • 56. NoSQL: Redis ORI + MRI on Steroids Jredis - Redis java client • Pipelining support (sync and async semantics) • Redis 1.2.3 compliant The missing parts • No consistent hashing • No rebalancing Mittwoch, 10. März 2010
  • 57. Message Store (Voldemort) Mittwoch, 10. März 2010
  • 58. Message Store (Voldemort) Mittwoch, 10. März 2010
  • 59. NoSQL:Voldemort No #fail Messagestore (MS) • Key-Value Store • Replication • Versioning • Eventual Consistency • Pluggable Routing / Hashing Strategy • Rebalancing • Pluggable Storage-Engine Mittwoch, 10. März 2010
  • 60. NoSQL:Voldemort No #fail Messagestore (MS) Configuring replication, reads and writes <store> <name>stream-ms</name> <persistence>bdb</persistence> <routing>client</routing> <replication-factor>3</replication-factor> <required-reads>2</required-reads> <required-writes>2</required-writes> <prefered-reads>3</prefered-reads> <prefered-writes>3</prefered-writes> <key-serializer><type>string</type></key-serializer> <value-serializer><type>string</type></value-serializer> <retention-days>8</retention-days> </store> Mittwoch, 10. März 2010
  • 61. NoSQL:Voldemort No #fail Messagestore (MS) Write: client.put(key, myVersionedValue); Update(read-modify-write): public class MriUpdateAction extends UpdateAction<String, String> { public MriUpdateAction(String key, ItemIndex index) { this.key = key; this.index = index; } @Override public void update(StoreClient<String, String> client) { Versioned<String> versionedJson = client.get(this.key); versionedJson.setObject("my value"); client.put(this.key, versionedJson); } } Mittwoch, 10. März 2010
  • 62. NoSQL:Voldemort No #fail Messagestore (MS) Eventual Consistency - Read public MriInconsistencyResolver implements InconsistencyResolver<Versioned<String>> { public List<Versioned<String>> resolveConflicts(List<Versioned<String>> items){ Versioned<String> vers0 = items.get(0); Versioned<String> vers1 = items.get(1); if(vers0 == null && vers1 == null) { return null; } List<Versioned<String>> li = new ArrayList<Versioned<String>>(1); if(vers0 == null) { li.add(vers1); return li; } if(vers1 == null) { li.add(vers0); return li; } // resolve your inconsistency here e.g. merge to lists } } The default inconsistency resolver automatically takes the newer version Mittwoch, 10. März 2010
  • 63. NoSQL:Voldemort No #fail Messagestore (MS) Configuration • Choose a big number of partitions • Reduce the size of the BDB append log • Balance Client and Server Threadpools Mittwoch, 10. März 2010
  • 64. Concurrent ID Generator Mittwoch, 10. März 2010
  • 65. Concurrent ID Generator Mittwoch, 10. März 2010
  • 66. NoSQL: Hazelcast Concurrent ID Generator (CIG) • In Memory Data Grid • Dynamically Scales • Distributed java.util.{Queue|Set|List|Map} and more • Dynamic Partitioning with Backups • Configurable Eviction • Persistence Mittwoch, 10. März 2010
  • 67. NoSQL: Hazelcast Concurrent ID Generator (CIG) Cluster-wide ID Generation: • No UUID because of architecture constraint • IDs of Stream Entries generated via Hazelcast • Replication to avoid loss of count • Background persistence used for disaster recovery Mittwoch, 10. März 2010
  • 68. NoSQL: Hazelcast Concurrent ID Generator (CIG) Generate Unique Sequencial Numbers (Distributed Autoincrement): • Nodes get ranges assigned (node1: 10000-19999, node2: 20000-29999  ID's) • IDs per range locally incremented on the node (thread safe/atomic) • Distributed locks secure range assignment for nodes Mittwoch, 10. März 2010
  • 69. NoSQL: Hazelcast Concurrent ID Generator (CIG) Example Configuration <map name=".vz.stream.CigHazelcast"> <map-store enabled="true"> <class-name>net.vz.storage.CigPersister</class-name> <write-delay-seconds>0</write-delay-seconds> </map-store> <backup-count>3</backup-count> </map> Mittwoch, 10. März 2010
  • 70. NoSQL: Hazelcast Concurrent ID Generator (CIG) Future use-cases: • Advanced preaggregated cache • Distributed Executions Mittwoch, 10. März 2010
  • 71. Lessons Learned • Start benchmarking and profiling your app early! • A fast and easy Deployment keeps the motivation up • Configure Voldemort carefully (especially on large heap machines) • Read the mailing lists of the #nosql system you use • No Solution in docs? - read the sources! • At some point stop discussing and just do it! Mittwoch, 10. März 2010
  • 72. In Progress • Network Stream • Global Public Stream • Stream per Location • Hashtags Mittwoch, 10. März 2010
  • 73. Future • Geo Location Stream • Third Party API Mittwoch, 10. März 2010