SHC Israel: GigaSpaces Case Study

Tomer Gabel
Tomer GabelConsulting Engineer at Substrate Software Services
Introducing Social Networking Into an e-commerce Platform,[object Object],Tomer Gabel |  SHC Israel,[object Object],03.02.2011,[object Object]
Social Commerce: An Introduction,[object Object],The last few years have seen tremendous growth in social networks,[object Object],Some estimates place Facebook above Google,[object Object],Even if not, we’re talking millions of daily unique visitors,[object Object],So the obvious question is… where’s the money?,[object Object],2,[object Object]
Social Commerce: An Introduction,[object Object],3,[object Object]
Social Commerce: Business Case,[object Object],What’s wrong with traditional e-commerce?,[object Object],Discovery/recommendation features are extremely hard to get right,[object Object],Overly broad market targeting means lost sales and disgruntled, ad-weary customers,[object Object],The trust model is inherently broken,[object Object],Impossible to gauge truth and accuracy in customer reviews,[object Object],“Wisdom of the masses” does not always apply,[object Object],Not fun!,[object Object],Shopping is a social experience (going to the mall, holiday shopping sprees),[object Object],This does not translate to existing e-commerce sites!,[object Object],4,[object Object]
Social Commerce: Business Case,[object Object],“Social commerce” aims to address these deficiencies,[object Object],Correlating interests and products is more accurate and significantly easier when based on social context,[object Object],Social circles are inherently constructed on shared interests and perspectives,[object Object],A customer’s social network is much smaller in scope than generating a global, statistical recommendation model,[object Object],More accurate personalized data exposes new opportunities,[object Object],Personalized discovery allows more opportunity to tap the long tail,[object Object],Social interaction makes it easy to identify domain experts,[object Object],A single opinion provided by a friend, family member or acquaintance is more trustworthy than dozens of unrelated product reviews/ratings,[object Object],5,[object Object]
Social Commerce: Business Case,[object Object],Most crucially, social commerce is all about user engagement and collaboration:,[object Object],Should I buy an iPhone, Blackberry or Android phone?,[object Object],Which wedding dress looks best? ,[object Object],Which video games are suitable for a preschooler?,[object Object],6,[object Object],Ask your friends!,[object Object]
Social Commerce: The Axiom,[object Object],Social features increase user engagement,[object Object],Increased conversion,[object Object],Profit!,[object Object],7,[object Object]
8,[object Object],Enter: Delver,[object Object]
Enter: Delver,[object Object],The Delver team has two products on the market,[object Object],Two sides of the same coin, really:,[object Object],sears.com is a traditional e-commerce website with a social twist,[object Object],delver.com is a traditional social website with an e-commerce twist,[object Object],9,[object Object]
The Technical Challenge,[object Object],sears.com is a fully blown commercial retail site,[object Object],Over 1 million page-views daily,[object Object],Over 270,000 visitors daily,[object Object],Traffic can easily spike up to ten times in the holiday season!,[object Object],10,[object Object]
The Technical Challenge,[object Object],Processing social networks is not an easy proposition,[object Object],Massive amounts of branching data,[object Object],No data locality,[object Object],Very few assumptions can be made about the data,[object Object],Let’s address each of these in turn,[object Object],11,[object Object],Source: NetworkWeaver,[object Object]
The Technical Challenge,[object Object],Massive amounts of branching data:,[object Object],Imagine every Facebook user (500 million),[object Object],Imagine each person is only connected to 100 others (conservative estimate),[object Object],How is user X connected with Y?,[object Object],X has 100 friends,[object Object],Each of them has 100 friends,[object Object],10,001 nodes visited!,[object Object],101 reads from the underlying storage system!,[object Object],12,[object Object],X,[object Object],Y,[object Object]
The Technical Challenge,[object Object],No data locality:,[object Object],Any object may be connected to any other object in no particular order,[object Object],How to split the data?,[object Object],Some research is being done in the area (SPAR),[object Object],13,[object Object]
The Technical Challenge,[object Object],No easy assumptions:,[object Object],No “typical user”,[object Object],Not enough data to draw archetypes,[object Object],Significant, unavoidable long tail,[object Object],Difficult to pre-tune data structures,[object Object],14,[object Object]
The Technical Challenge,[object Object],The crux of the problem:,[object Object],High branch factor necessitates many loads to serve even a simple request,[object Object],No data locality + high branch factor means very high random I/O,[object Object],Traditional storage models (RDBMS, flat files etc.) are a poor fit,[object Object],Serious research into graph storage, social network composition etc. only dates back a few years,[object Object],No best practices or “accepted truths” to build on,[object Object],15,[object Object]
Use Case for GigaSpaces,[object Object],To solve the graph storage and traversal problem, we arrived at the following requirements:,[object Object],Completely in-memory storage,[object Object],No data locality means caching is inefficient,[object Object],Massive amounts of random I/O cannot scale vertically, and hardware (basically, spindle count) cost quickly becomes prohibitive,[object Object],If data access is sufficiently fast, data can be randomly partitioned,[object Object],Horizontal scaling with a well-known scale-up strategy,[object Object],Add more memory or more nodes to handle data growth,[object Object],Add more CPUs or additional nodes to handle load growth,[object Object],16,[object Object]
Use Case for GigaSpaces,[object Object],Additional requirements include:,[object Object],Map/Reduce execution framework,[object Object],Graph traversal and data analysis requirements lend well to the map/reduce paradigm,[object Object],Code execution on the data nodes,[object Object],Because of the massive amounts of data involved, the network interface will be quickly saturated by retrievals,[object Object],Memory retrieval is at least two orders of magnitude faster than network throughput (DDR2-800 on a dual channel memory controller has a theoretical throughput maximum of 102.4Gb/s),[object Object],17,[object Object]
Use Case for GigaSpaces,[object Object],As an operations tech I had a few things to add to the list, namely…,[object Object],Nonfunctional requirements:,[object Object],Built-in fault tolerance and high availability,[object Object],Zero-configuration (or as close to it as it gets) setup; in particular, component discovery and assignment must be automated,[object Object],Well-documented deployment, configuration and tuning process,[object Object],Monitoring API,[object Object],Administrative client for diagnosis, trouble resolution and manual intervention,[object Object],18,[object Object]
Use Case for GigaSpaces,[object Object],GigaSpaces features map well to our requirements,[object Object],Data grid,[object Object],Compute grid,[object Object],High availability,[object Object],Horizontal data and load scaling,[object Object],Management API,[object Object],Very few viable alternatives:,[object Object],Hadoop, neo4j are disk-based,[object Object],Terracotta is overly simplistic and has no execution framework,[object Object],Oracle Coherence is expensive and has a limited feature set,[object Object],19,[object Object]
Delver Architecture,[object Object],We ended up with a hybrid platform:,[object Object],GigaSpaces for graph storage, traversal and analysis,[object Object],MySQL for traditional, “simple” data as well as a backing store for GigaSpaces,[object Object],.NET-based front-end, Java-based back-end,[object Object],We had to factor our organization accordingly,[object Object],Data access team provides abstracted interfaces on top of GigaSpaces and MySQL,[object Object],Back-end “heavy lifting” services (e.g. recommendation engine) work directly against GigaSpaces,[object Object],Most other components either use the abstracted DAL or are simple enough to work directly against MySQL using (N)Hibernate,[object Object],20,[object Object]
Delver Architecture,[object Object],21,[object Object]
Key Benefits,[object Object],Significantly reduced integration costs,[object Object],GigaSpaces does a lot of what we need out of the box,[object Object],An alternative solution would require integrating several products, incurring significant integration and development overhead,[object Object],Broad feature set,[object Object],Social commerce is an emerging, dynamic market requiring rapid experimentation and adaptation,[object Object],The large feature set allows us to introduce new features into the system at a furious pace,[object Object],While primarily intended for graph storage, we also use GigaSpaces as a message queue, distributed lock server and distributed scheduler,[object Object],22,[object Object]
1 of 22

More Related Content

More from Tomer Gabel(20)

Recently uploaded(20)

ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman161 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum120 views

SHC Israel: GigaSpaces Case Study

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.

Editor's Notes

  1. Image sources (also linked):* Facebook US traffic estimates (http://www.insidefacebook.com/2011/01/03/november-2010-facebook-traffic/)* Top 20 visited websites (http://www.hitwise.com/us/datacenter/main/)
  2. Sources:Sumeet Jain: http://vator.tv/news/2010-12-27-2011-location-and-social-will-rule-commerceAndy Leaver: http://www.currybet.net/cbet_blog/2009/11/notes-and-quotes-from-ecommerc-2.phpGordon Gould: http://socialcommercetoday.com/will-2010-be-the-year-of-social-commerce/
  3. Case in point:Discovery and recommendation features: when was the last time YOU “might be interested in this product”? How accurate are the typical recommendation systems for you?Just how relevant is the typical ad or marketing campaign? When was the last time you went into Amazon and got a coupon for a truly relevant occasion or product? How tired are you of flashing banners?
  4. Public data source: http://bizinformation.ca/www.sears.com#visitors
  5. Massive amounts of data:* Imagine modeling every person on the planet (say, 6 billion). Now say each person is connected to just 100 others (a conservative estimate)Image source: http://networkweaver.blogspot.com/2010/03/overlapping-boards.html
  6. SPAR presentation: http://www.slideshare.net/jmpujol/the-little-engines-that-could-scaling-online-social-networksImage source: http://paysa09.wikispaces.com/Networks
  7. Time distribution image source: http://blog.nielsen.com/nielsenwire/global/social-norms-twitter-users-follow-the-797-rule-in-the-u-k/Twitter following distribution image source: http://www.personalizemedia.com/twitter-long-tail-broadcastization-pre-twitter-reputation/
  8. Inevitably, someone will ask: what are the problems you encountered?Barrier of entry:Ops: setting up a GigaSpaces cluster is not a hassle-free affair. Lots of work went into a robust, efficient bootstrapping procedure and we had to content with quite a few unexpected snags. I believe things are a lot better with the current version than they were a while ago. Furthermore, the overall cost of setting up and deploying GigaSpaces is significantly less than the total overhead for using specific products to tackle our various needs (compared to a traditional system, the cost of setting up e.g. MySQL+RHCS+client configuration; more likely we’d have had to use some sort of 3rd party graph storage, clustering and persistence solution)Devs: working against GigaSpacesis considerably harder than vanilla, commonplace RDBMS. To counter the barrier of entry we modeled our organization so that a core team of developers handle graph storage and data analysis, with most other teams either integrating with this subsystem or handling their own requirements with regular Hibernate/NHibernate over MySQL.Hard to handle migration paths, zero-time deployment and schema evolution. Features in 8.0 should help remedy the situation (cue Nati Shalom)