SHC Israel: GigaSpaces Case Study

Introducing Social Networking Into an e-commerce Platform Tomer Gabel | SHC Israel 03.02.2011

Social Commerce: An Introduction The last few years have seen tremendous growth in social networks Some estimates place Facebook above Google Even if not, we’re talking millions of daily unique visitors So the obvious question is… where’s the money? 2

Social Commerce: An Introduction 3

Social Commerce: Business Case What’s wrong with traditional e-commerce? Discovery/recommendation features are extremely hard to get right Overly broad market targeting means lost sales and disgruntled, ad-weary customers The trust model is inherently broken Impossible to gauge truth and accuracy in customer reviews “Wisdom of the masses” does not always apply Not fun! Shopping is a social experience (going to the mall, holiday shopping sprees) This does not translate to existing e-commerce sites! 4

Social Commerce: Business Case “Social commerce” aims to address these deficiencies Correlating interests and products is more accurate and significantly easier when based on social context Social circles are inherently constructed on shared interests and perspectives A customer’s social network is much smaller in scope than generating a global, statistical recommendation model More accurate personalized data exposes new opportunities Personalized discovery allows more opportunity to tap the long tail Social interaction makes it easy to identify domain experts A single opinion provided by a friend, family member or acquaintance is more trustworthy than dozens of unrelated product reviews/ratings 5

Social Commerce: Business Case Most crucially, social commerce is all about user engagement and collaboration: Should I buy an iPhone, Blackberry or Android phone? Which wedding dress looks best? Which video games are suitable for a preschooler? 6 Ask your friends!

Social Commerce: The Axiom Social features increase user engagement Increased conversion Profit! 7

Enter: Delver The Delver team has two products on the market Two sides of the same coin, really: sears.com is a traditional e-commerce website with a social twist delver.com is a traditional social website with an e-commerce twist 9

The Technical Challenge sears.com is a fully blown commercial retail site Over 1 million page-views daily Over 270,000 visitors daily Traffic can easily spike up to ten times in the holiday season! 10

The Technical Challenge Processing social networks is not an easy proposition Massive amounts of branching data No data locality Very few assumptions can be made about the data Let’s address each of these in turn 11 Source: NetworkWeaver

The Technical Challenge Massive amounts of branching data: Imagine every Facebook user (500 million) Imagine each person is only connected to 100 others (conservative estimate) How is user X connected with Y? X has 100 friends Each of them has 100 friends 10,001 nodes visited! 101 reads from the underlying storage system! 12 X Y

The Technical Challenge No data locality: Any object may be connected to any other object in no particular order How to split the data? Some research is being done in the area (SPAR) 13

The Technical Challenge No easy assumptions: No “typical user” Not enough data to draw archetypes Significant, unavoidable long tail Difficult to pre-tune data structures 14

The Technical Challenge The crux of the problem: High branch factor necessitates many loads to serve even a simple request No data locality + high branch factor means very high random I/O Traditional storage models (RDBMS, flat files etc.) are a poor fit Serious research into graph storage, social network composition etc. only dates back a few years No best practices or “accepted truths” to build on 15

Use Case for GigaSpaces To solve the graph storage and traversal problem, we arrived at the following requirements: Completely in-memory storage No data locality means caching is inefficient Massive amounts of random I/O cannot scale vertically, and hardware (basically, spindle count) cost quickly becomes prohibitive If data access is sufficiently fast, data can be randomly partitioned Horizontal scaling with a well-known scale-up strategy Add more memory or more nodes to handle data growth Add more CPUs or additional nodes to handle load growth 16

Use Case for GigaSpaces Additional requirements include: Map/Reduce execution framework Graph traversal and data analysis requirements lend well to the map/reduce paradigm Code execution on the data nodes Because of the massive amounts of data involved, the network interface will be quickly saturated by retrievals Memory retrieval is at least two orders of magnitude faster than network throughput (DDR2-800 on a dual channel memory controller has a theoretical throughput maximum of 102.4Gb/s) 17

Use Case for GigaSpaces As an operations tech I had a few things to add to the list, namely… Nonfunctional requirements: Built-in fault tolerance and high availability Zero-configuration (or as close to it as it gets) setup; in particular, component discovery and assignment must be automated Well-documented deployment, configuration and tuning process Monitoring API Administrative client for diagnosis, trouble resolution and manual intervention 18

Use Case for GigaSpaces GigaSpaces features map well to our requirements Data grid Compute grid High availability Horizontal data and load scaling Management API Very few viable alternatives: Hadoop, neo4j are disk-based Terracotta is overly simplistic and has no execution framework Oracle Coherence is expensive and has a limited feature set 19

Delver Architecture We ended up with a hybrid platform: GigaSpaces for graph storage, traversal and analysis MySQL for traditional, “simple” data as well as a backing store for GigaSpaces .NET-based front-end, Java-based back-end We had to factor our organization accordingly Data access team provides abstracted interfaces on top of GigaSpaces and MySQL Back-end “heavy lifting” services (e.g. recommendation engine) work directly against GigaSpaces Most other components either use the abstracted DAL or are simple enough to work directly against MySQL using (N)Hibernate 20

Key Benefits Significantly reduced integration costs GigaSpaces does a lot of what we need out of the box An alternative solution would require integrating several products, incurring significant integration and development overhead Broad feature set Social commerce is an emerging, dynamic market requiring rapid experimentation and adaptation The large feature set allows us to introduce new features into the system at a furious pace While primarily intended for graph storage, we also use GigaSpaces as a message queue, distributed lock server and distributed scheduler 22

SHC Israel: GigaSpaces Case Study

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to SHC Israel: GigaSpaces Case Study

Similar to SHC Israel: GigaSpaces Case Study (20)

More from Tomer Gabel

More from Tomer Gabel (20)

Recently uploaded

Recently uploaded (20)

SHC Israel: GigaSpaces Case Study

Editor's Notes