From Content Storage to Scaling Smart Data

1,256 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,256
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

From Content Storage to Scaling Smart Data

  1. 1. Smart data, Lily at scale madE easy from content storage to scaling smart data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.orgmaandag 6 juni 2011
  2. 2. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2maandag 6 juni 2011
  3. 3. the pain data need for distributed processing moore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3maandag 6 juni 2011
  4. 4. the pain » growth of data sets » smart businesses need to apply analytics to Smart data, activities at scale » doing business online means real-time madE easy » talent shortage IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4maandag 6 juni 2011
  5. 5. LILY The Real-time Platform built for the Age of Data. We manage, track and measure your data and users, and do the mat(c)hmaking in-between: » provide you with business intelligence and analytics » harvest user profiles and learn their interests » dynamically engage your users using quality recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5maandag 6 juni 2011
  6. 6. where would you use lily? » large collections of data » large groups of users » content repositories » e-commerce / retail » library catalogs » news / media » (media) asset management » product catalogs » ‘live’ archives » ... if you want to use big data, but you need easy. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6maandag 6 juni 2011
  7. 7. ns pe ap gic h ma he t re he sw si + thi IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7maandag 6 juni 2011
  8. 8. beyond content management marketing broadcast revenue product / service IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8maandag 6 juni 2011
  9. 9. beyond content management: data + analytics recommendations call to action personalised revenue product / service audience data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9maandag 6 juni 2011
  10. 10. LILY 2.0: smart data SMARTER DATA data processing s relation recommendations semantic augmentation Analytics usage metrics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10maandag 6 juni 2011
  11. 11. roadmap » now: highly-scalable data repository: store, index and search » next: with real-time usage stats gathering and analytics » later: and built-in context- and user-sensitive recommendations » built on top of Google BigTable / HBase / Solr » identical, robust technology in use at Facebook, Twitter, StumbleUpon, Yahoo! » scales widely over distributed (cloud) infrastructure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11maandag 6 juni 2011
  12. 12. Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12maandag 6 juni 2011
  13. 13. Sample Lily Schema (excerpt) 

{ namespaces:
{ 



name:
"b$name", 



/*
Declaration
of
namespace
prefixes.
*/ 



valueType:
{
primitive:
"STRING"
}, 



"org.lilyproject.bookssample":
"b", 



scope:
"versioned" 



"org.lilyproject.vtag":
"vtag" 

}, 

}, 

{ fieldTypes:
[ 



name:
"b$bio", 

{ 



valueType:
{
primitive:
"STRING"
}, 



name:
"b$title", 



scope:
"versioned" 



valueType:
{
primitive:
"STRING"
}, 

}, 



scope:
"versioned" 

{ 

}, 



name:
"vtag$last", 

{ 



valueType:
{
primitive:
"LONG"
}, 



name:
"b$pages", 



scope:
"non_versioned" 



valueType:
{
primitive:
"INTEGER"
}, 

} 



scope:
"versioned" 

], 

}, recordTypes:
[ 

{ 

{ 



name:
"b$language", 



name:
"b$Book", 



valueType:
{
primitive:
"STRING"
}, 



fields:
[ 



scope:
"versioned" 





{name:
"b$title",
mandatory:
true
}, 

}, 





{name:
"b$pages",
mandatory:
false
}, 

{ 





{name:
"b$language",
mandatory:
false
}, 



name:
"b$authors", 





{name:
"b$authors",
mandatory:
false
}, 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 





{name:
"vtag$last",
mandatory:
false
} 



scope:
"versioned" 



] 

}, 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13maandag 6 juni 2011
  14. 14. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14maandag 6 juni 2011
  15. 15. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15maandag 6 juni 2011
  16. 16. HBase indexing & RowLog Library » building and querying » need for sync/async indexes, GAE-style operations » updating of secondary indexes rowkey col col content A val3 foo6 (e.g. link tables) table B val2 foo7 » feeding of Indexer (= indexes Lily-content into Solr) rowkey col » not: transactions order index table A val2-B val3-A » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16maandag 6 juni 2011
  17. 17. The Lily Indexer sharding towards indexing of multiple incremental index blob content denormalization batch index building multiple SOLR versions of a record updating extraction instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17maandag 6 juni 2011
  18. 18. status june 2011 » Lily 1.0.1 released - developing since Q4/09 » some customers - DIY retail / media / news » e-commerce platform project » Lily as the data (integration) tier » first contrib: FrogPond (annotated Java <> Lily mapper) https://bitbucket.org/calmera/frogpond IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18maandag 6 juni 2011
  19. 19. Next up: usage stats » sits in CRUD-path » tracks users ops against records interactions » from both perspectives record user » arbitrary K/V properties: time, location, ... rec » automatically builds user om me nd ati o profiles (as records) ns indexes e tim » tied to records ops » indexed access » time dimension: trending IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19maandag 6 juni 2011
  20. 20. from usage stats to recommendations ‘light’ record user » grouping of users based on » shared properties » shared record access » grouping of records based on » shared properties { connections » shared user operations recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20maandag 6 juni 2011
  21. 21. full-on recommendations » look at real-time-capable Mahout algorithms » pre-index or -calculate as much as possible » save as secondary indexes » present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to record processing pipeline » pattern detection, keywords, ontologies, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21maandag 6 juni 2011
  22. 22. timeline » Lily + usage stats 10/2011 » Lily + usage stats + light-weight analytics 12/2011 » Lily + recommendations ‘light’ 3/2012 » Lily 2.0 : full-on recommendations 6/2012 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22maandag 6 juni 2011
  23. 23. lily enterprise » adds tools: » yum/deb package repo » cluster deploy scripts (also EC2) » Admin UI » + enterprise support IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23maandag 6 juni 2011
  24. 24. demo (if time permits) message part ‣to ‣content ‣from ‣mediaType ‣parts ‣message ‣listId ‣subject ‣sender IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24maandag 6 juni 2011
  25. 25. WHERE? www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25maandag 6 juni 2011
  26. 26. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.orgmaandag 6 juni 2011

×