Adaptive Blue Java Nyc Meetup - Presentation Transcript
AdaptiveBlue @
Java NYC Meetup
April 20, 2009
Alex Iskold, Founder/CEO
http://getglue.com
Agenda
About AdaptiveBlue
Glue: The Network of People and Things
Glue: Building on Amazon Web Services
Glue: Semantic Technology Stack
About AdaptiveBlue
Founded in 2006, based in New York
Funded by USV and RRE
Focuses on enhancing browsing experience
Launched BlueOrganizer and Glue add-ons for
Firefox and SmartLinks Widgets for blogs
Get Glue. The Network That
Sticks With You.
http://getglue.com
What is Glue?
Glue is a contextual network that uses
semantic technology to automatically
connect people around everyday things -
books, music, movies, stars, artists,
stocks, wine, restaurants and more.
1. Contextual: Glue is distributed and appears
when it makes sense on popular sites.
2. Automatic: Users participate in Glue just by
browsing their favorite sites.
3. Simple: Glue removes the friction involved
in networking - the network comes to you.
Glue Demo
Glue:
Building on Amazon
Web Services
AWS-based Architecture
Client Layer
Browser Add-Ons Widgets iPhones Facebook Apps API Clients
Load Balancer Layer
Round Robin DNS
Load Balancer 2
Load Balancer 1
Web Service Layer
Host 1 (EC2) Host N (EC2)
Glue Web Service Glue Web Service
...
Batch Services Batch Services
Database Layer
Amazon S3 Rackspace MySQL
Amazon SimpleDB
Object Database/ User accounts
Interactions between
People Profiles Analytics
People and Things
AdaptiveBlue AWS Stack
Relating People and Things ( SimpleDB )
Records of people’s interactions around things are stored
in SimpleDB Domains using duplication for fast access.
Storing Object Meta Data ( S3 )
XML representation of millions of books, music,
movies, etc. is stored using Amazon S3
Transactional and Batch Support ( EC2 )
Web Service Requests and batches are
distributed through EC2 instances.
Amazon SimpleDB in a Nutshell
Simple DB Domain
Idea:
Record 1
Create flat database with
Key1 Attributes: A1,A2…
auto-indexed tables.
…
Record N Main Features:
Key2 Attributes: A1,A2…
Each attribute is indexed.
Record structure is flexible.
Basic operators in queries
Get record
Supports sorting.
Put record Query records
Client
How Glue uses SimpleDB
Interaction Record
Key1 Attributes: A1,A2…
Object Domains People Domains
… …
OD1 OD2 ODN PD1 PD2 PDN
Each record is duplicated into Object and Person Domain
The Key is a combination of USER_ID and OBJECT_KEY
Djb2hash is used to calculate the domain for each record
Records for each USER and each OBJECT inside the same domain.
Amazon S3 in a Nutshell
Idea:
Amazon S3
Put/Get objects into buckets
Bucket 1 Bucket N
based on unique keys.
…
Main Features:
Put object Get object
Public/Private access.
Support for large objects.
Client
How Glue Uses S3
Object Bucket People Bucket
XML-files with object information XML-files with user and friends info
XML is serialized as string and written to S3
Each file has a unique key: OBJECT_ID or USER_ID/profile, etc.
Amazon EC2 in the Nutshell
Usage:
Create Machine Image
Deploy the image to S3
Start 1 or more instances
Use it as regular machine(s)
Main Options:
Machine Dynamic/Static IPS
Image Choose cores
(OS + Apps) Choose locations
Persistence via EBS
How Glue uses EC2
Round Robin DNS
Load Balancer 2
Load Balancer 1
Host 1 (EC2/Rackspace) Host N (EC2/Rackspace)
Glue Web Service Glue Web Service
...
Batch Services Batch Services
Web Service processes transactional requests
Batch Services are time-based & run on sets of USERS and OBJECTS
The system scales by equally partitioning Data and Requests
Glue:
Semantic
Technologies Stack
Semantic Technology Stack
Concept Definition
Server-based XML schemas for things (nouns):
books, music, movies, stocks, wines, recipes, etc.
Identity Algorithms
Correlation of the same thing from different pages across the web.
Recognition Algorithms
Recognition of things in Pages, Links and Text
Semantic Technology Stack:
Concept Definitions
1. XML-based: A schema file resides on the
server for each type.
2. Data Composition: Each type has attributes
(i.e. book has author, etc.)
3. Extensible: New types can be plugged into
the engine dynamically.
Semantic Technology Stack:
Identity Algorithms
1. Key-based: Each object in the system has
unique key, depending on its type:
books/kite_runner/khaled_hosseini
2. Attribute-based: Keys are based on the
combination of attributes (i.e. title/author)
3. Normalized: Multiple transformations and
validations are applied to raw text to
generate the keys.
Semantic Technology Stack:
Recognition Algorithms
1. Extraction: First phase of the recognition is
based on processing elements of the page:
XML-based framework for parsing DOM used
both by Java backend and JavaScript client.
2. Cleaning: Second phase of the recognition is
asynchronous query of multiple web services/API.
For books we query Amazon, for movies Netflix,
etc. and then normalize and merge results.
3. Caching: Clean objects are cached. Misses/false-
positives are patched manually.
Presentation of Glue, http://getglue.com, a browser more
Presentation of Glue, http://getglue.com, a browser addon made by AdaptiveBlue. In depth discussion of how we use Amazon Web Services and Semantic Algorithms. less
0 comments
Post a comment