6. Arthur Gimpell
‹ ›6
NoSQL Tel Aviv: Meetup Agenda
E
OBJECTIVE COMPARISONS
q
NETWORKING
p
KNOWLEGE SHARING
7. Arthur Gimpell
‹ ›7
About Me
•Working with databases for 8 years
•5 years, SQL Server & .NET
•3 years with NoSQL & Python & Node.js
•2015 - Founded DataZone
8. Arthur Gimpell
‹ ›8
DataZone | Data is our business! What’s yours?
•Consultancy & projects
•Private & public training
•Multi vendor, multi tier support with SLA
•Child unit of CloudZone, public cloud leaders
10. Arthur Gimpell
‹ ›10
uBar: Toolbar Company
•uBar’s toolbar provides a search engine and various utilities on the toolbar itself
•uBar’s revenue streams:
•Ads, provided on uBar’s search engine
•Bundled downloads with partners
•Selling user data & statistics, gathered by the toolbar user’s usage analysis
11. Arthur Gimpell
‹ ›11
uBar: Architecture
MSSQL
Sessions Toolbar Usage Analytics
•uBar’s solution is built on SOA:
•Sessions: Session & users mgmt. service
•Toolbar Usage: user statistics gathering
•Analytics: Near realtime BI
12. Arthur Gimpell
‹ ›12
uBar: Sessions Service - Features
MSSQL
Sessions
•Sessions are created when a client opens a
browser
•Sessions are ended when client closes browser,
or no activity is made during some specific time
•Users are mainly marketing, campaign
managers, media buyers and more. Those users
consume data from the Analytics service
13. Arthur Gimpell
‹ ›13
uBar: Sessions Service - Main Objects
MSSQL
Sessions
•Session: SessionId, ToolbarClientId,
UserId, UserAgent, StartTime
•User: UserId, UserPermissions,
Username, PasswordHash
•UserPermissions: UserId, PermissionId
•Permissions: PermissionId, Name
14. Arthur Gimpell
‹ ›14
uBar: Toolbar Usage Service - Features
MSSQL
•Every time an event occurs, like opening
a browser by a client, or browsing the
internet, the usage service saves data
about this event in the relevant table.
•ToolbarUsage writes ± 50M events per
day
Toolbar Usage
15. Arthur Gimpell
‹ ›15
uBar: Toolbar Usage Service - Main Objects
MSSQL
•ToolbarStart: ToolbarClientId, StartTime, [User data columns]
•NewTab: ToolbarClientId, NewTabUrl, [User data columns]
•ToolbarClicks: ToolbarClientId, ToolbarFeatureId, [User data
columns]
•WebsiteVisit: ToolbarClientId, WebsiteUrl, [User data
columns]
•ToolbarClients: ToolbarClientId, ToolbarVersion,
BundledVersion, BundleId
Toolbar Usage
16. Arthur Gimpell
‹ ›16
uBar: Analytics Service - Features
MSSQL
•Analytics service is providing Users with dashboards filled
with data.
•The data is pre aggregated every 1 hour in the database,
and saved to different tables
•The analytics service provides the most important KPI when
releasing campaigns to millions of users, and according to
its data operative decisions are made(stopping bad
campaigns, detecting bugs, ab testing etc..)
Analytics
17. Arthur Gimpell
‹ ›17
uBar: Challenges
•Velocity: 10k writes/sec on Usage service, 1k writes/
sec on Sessions service
•Volume: 1TB of operational data(1 month retention)
•New clients increase the velocity, and IO subsystem is
a bottleneck
•Campaign managers want more and more insights in
realtime, which require writing complex aggregation
jobs on the database and use CPU intensively.
RDBMS
Sessions Toolbar Usage Analytics
18. Arthur Gimpell
‹ ›18
Issues with Relational Database Management Systems in the IoT Age
•Everything is persisted, synchronously. Limited by IO
performance.
•All data is bound to a tabular schema, hard to make
changes in big databases.
•All data relies on a single data store, making it hard to
scale horizontally.
•Complex schema slows down aggregations and
queries drastically.
RDBMS
Sessions Toolbar Usage Analytics
19. Arthur Gimpell
‹ ›19
Polyglot Persistance: Overview
Key Value
Suitable for key value access patterns.
Main benefits are concurrency on key level
(Optimistic & Pessimistic), and extremely
easy scaling.
Document Store
Data which is more suitable
for OOP languages, storing
complex data (JSON) while
allowing scaling and
distribution.
Search / Index stores
Every data store serves a
different component of
the application,
according to its access
patterns and needs.
Consept
Suitable for cases where the main data
store cannot handle complex querying,
Allows scaling the querying layers
separately from operational data access
(CUD in CRUD).
20. Arthur Gimpell
‹ ›20
uBar: New Data Solution’s Targets
New Data
Solution
Handle the traffic, Velocity and Volume should not limit the product
Allow more realtime analytics, and more
complex slice & dice for the product
Use open source where possible,
Reduce costs.
22. Arthur Gimpell
‹ ›22
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Sessions are written with a UUID(SessionId),
and not sorted in any way in the table (Heap).
• Values:
•ToolbarClientId (Foreign key to ToolbarClient)
•UserId (Foreign key to User)
•UserAgent (Unstructured string)
•StartTime (DateTime)
?
23. Arthur Gimpell
‹ ›23
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Users and Permission tables are quite
simple and its own values with Many to
Many relation table (UsersPermissions) ?
24. Arthur Gimpell
‹ ›24
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Sessions writing Velocity is 1k/sec. IO is a
bottleneck.
•Sessions are written in Key Value pattern
•Users and Permissions are not
problematic, since those are cached in
the application and rarely change.
?
25. Arthur Gimpell
‹ ›25
uBar: Possible data stores for Sessions service
Sessions
•Candidate technologies with needed
throughput, complex data support, and
needed velocity: Redis, Couchbase,
Marklogic
?
26. Arthur Gimpell
‹ ›26
uBar: Analysing Toolbar Usage schema analysis & access patterns
•Toolbar Usage tables are not normalized in SQL
Server, and written as raw data.
•Usage write pattern is key value, where value is large
(30kb) and unstructured(User agent).
•Velocity in writes is 10k/sec,
•Toolbar Usage data is also time series data. The tables
have a clustered TimeStamp column(and partitioned
by it), for easier Analytics and aggregation.
?
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
27. Arthur Gimpell
‹ ›27
uBar: Possible data stores for Toolbar Usage service
•Again, needed write pattern is Key Value.
•Data sizing, and needed throughput fits
Redis, Couchbase, Marklogic the same
way.
•Sessions and ToolbarUsage both can rely
(potentially) on the same data store.
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
28. Arthur Gimpell
‹ ›28
uBar: Analysing Toolbar Usage schema analysis & access patterns
•Analytics service’s schema is based on aggregated
data of ToolbarUsage & Sessions services.
•Development should be simple, in order to allow
maximal elasticity for product and analysts.
•Analysts should be able to query the data / ad hoc
•Data refresh should be less than 15 minutes
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
Analytics
?
29. Arthur Gimpell
‹ ›29
uBar: Possible data stores for Sessions service
•Possible services for analytics divide to
various groups:
•Classic BI solutions: Tableu, Qlikview,
Pantahoo
•Column Store DBMS: Redshift, Vertica..
•Pure search engine: Elasticsearch, Solr..
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
Analytics
BI Tools
ColumnStore
Search Engine
30. Arthur Gimpell
‹ ›30
uBar: Putting it all together - Operational Needs
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
AnalyticsVelocity Volume Price
Couchbase V V Low - Mid
Redis V V Low - Mid
Marklogic V V High
BI Tools
ColumnStore
Search Engine
31. Arthur Gimpell
‹ ›31
uBar: Putting it all together - Operational Needs
Toolbar UsageSessions
Redis?
Couchbase?
AnalyticsSupport Integration Final Notes
Couchbase
Vendor
Support - SLA
Elasticsearch -
XDCR
SQL Compatible -
JDBC ODBC
Rich
integrations,
High quality
Support
Redis
Redis Labs -
Managed
Plugin for Solr
Managed - no
maintenance
BI Tools
ColumnStore
Search Engine
Redis?
Couchbase?
32. Arthur Gimpell
‹ ›32
uBar: Putting it all together - Analytical Needs
Toolbar UsageSessions
Redis?
Couchbase?
AnalyticsPossibilities Pros Cons
BI Solutions
Tableu
Pentahoo
Qlikview
Simple for
business users,
Integrates with
Couchbase
Might get
expensive
Search Engines
Elasticsearch
Solr
Highly
customizable
Querying is
not straight
forward
BI Tools
Search Engine
Redis?
Couchbase?
33. Arthur Gimpell
‹ ›33
uBar: Final Architecture #1
Toolbar UsageSessions
Managed
Redis
Analytics
Elasticsearch
Managed
Redis
•Redis is managed. No maintenance at all
for operational and scalable cluster.
•Using Elasticsearch with Kibana is great
for time series data
•Data transformation will be made through
ETL.
34. Arthur Gimpell
‹ ›34
uBar: Final Architecture #2
Toolbar UsageSessions
Couchbase
Analytics
BI Tools +
Elasticsearch
Couchbase
•Couchbase is easy to use.
•With Couchbase’s SQL on JSONs (N1QL) It is 0
configuration to make it a data source for every
possible BI solution
•Couchbase’s Filtered replication to
Elasticsearch allows it to function only where
SQL is not enough.