The document discusses strategies for big data and high performance systems. It covers topics like in-memory databases, NoSQL databases like MongoDB and Cassandra, Hadoop and MapReduce, column-oriented databases, and sharding. It provides examples of how these technologies can be used to solve challenges like scalability, availability, and analyzing large datasets. Horizontal and vertical sharding techniques are explained for distributing data across multiple servers.
7. Where did it Fail?
Get an Answer, Fast and Cheap
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
8. Where did it Fail?
I Just Want “Class Persistency Storage”
and Changing Schema on Demand
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
9. Where did it Fail?
Be Always Available, Even w/ an Old Answer
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
10. Where did it Fail?
Get Me Fast and Good Enough Answer
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
11. Where did it Fail?
Data is Too Big, and Storage is $$$
But CPU and Network are Even More
http://www.powerbyte.com/Isilon.html
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
12. It is all great, but…
I Need to Meet Compliance
http://www.vision7.com/app_system/lib/image/content/PCI_compliance.jpg
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
13. It is all great, but…
I Need a Vendor
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
14. It is all great, but…
I Need Reporting
http://www.novell.com/communities/node/5851/get-ready-sentinel-61
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
15. It is all great, but…
I Need Transactions
http://www.novell.com/communities/node/5851/get-ready-sentinel-61
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
16. It is all great, but…
We Need Training for the Data Analysts
db.article.aggregate(
{ $group : {
_id : "$author",
< GROUP BY author
docsPerAuthor : { $sum : 1 }, < SUM(1) = N
viewsPerAuthor : { $sum : "$pageViews" }
< SUM(pageViews)
}}
);
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
18. The VP R&D Open Seminar
CLIENT SIDE
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
19. It’s a World Made of Pixels
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
20. The VP R&D Open Seminar
SERVER SIDE
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
21. General Strategies
Online
In Memory Databases and Q
Log files processing
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
23. 700 Inserts/Sec
In Memory Engine
3000 Inserts/Sec
Amazon
AWS
Standard
Large
Instance
InnoDB Engine
700 Inserts/Sec
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
24. The VP R&D Open Seminar
General Strategies
DATA SIDE
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
25. Strategy A - Sharding
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
26. Strategy B – MapReduce
http://blogs.microsoft.co.il/blogs/vprnd
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
27. Strategy C - NoSQL
insert
get
multiget
remove
truncate
<Key, Value>
http://wiki.apache.org/cassandra/API
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
28. The VP R&D Open Seminar
MongoDB
DOCUMENT DATABASES
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
29. When Should I Choose NoSQL?
•
•
•
Eventually Consistent
Document Store
Key Value
http://guyharrison.squarespace.com/blog/tag/nosql
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
32. A Blog Case Study in RDBMS
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
33. And as a SW Engineer would like it to be…
http://www.slideshare.net/nateabele/building-apps-with-mongodb
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
39. Key Concepts
Fast Answer
Not Always Right
Can Lose Data
Autosync
Bottom Line:
Use the memory
Multiple instances
Multiple instances
Client timestamp
Integrated Memcached + MySQL
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
40. Azure Table Storage: Key Concepts
Very Large Tables
Partitioning
Get by Key
Portioning Key
Sort
Single Sort Key
Simple Rows
Basic Types
No Joins, No Grouping, No Multiple Sorting
Bottom Line:
Simple Very Large Tables LDAP
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
42. The VP R&D Open Seminar
Hadoop
MAP REDUCE
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
43. Count Pageviews by Date
Map The Challenge
(Count on every node)
Reduce The Answers
(Get a Single Answer)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
44. Word Count
function map(String name, String document):
// name: document name
// document: document contents
for each word w in document:
emit (w, 1)
function reduce(String word, Iterator partialCounts):
// word: a word
// partialCounts: list of aggregated counts
sum = 0
for each pc in partialCounts:
sum += ParseInt(pc) emit (word, sum)
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
46. Hadoop as a Service
http://www.windowsazure.com/en-us/manage/services/hdinsight/get-started-hdinsight/
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
62. Horizontal Sharding
Static Hashing
Complex growth
Simple
Mod 10 = 0
Mod 10 = 1
Mod 10 = 2
Mod 10 = 3
Mod 10 = 4
Mod 10 = 5
Mod 10 = 6
Mod 10 = 7
Mod 10 = 8
Mod 10 = 9
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
63. Horizontal Sharding
Key locations are defined in a directory
Simple growth
Directory is SPOF
The Directory Can
be Very Large
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
64. Horizontal Sharding
Static Hashing with Directory Mapping
Simple Growth
The Small Directory Can be Cached on
Each App Server
Mod 1000 = 4
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
65. Horizontal Sharding
Each key is signed by the DB#
on creation
Simple growth
The Key Store Can be Cached on
Each App Server
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com
66. The Bottom Line: Grow ∞
Thank you!
and Keep Performing!
Moshe Kaplan
http://blogs.microsoft.co.il/blogs/vprnd
http://top-performance.blogspot.com