In 1945 mathematician and physicist John von Neumann proposed an architecture of a computer that became vastly successful. Today all commercially successful computer designs follow the same set of principles.
About 50 years later people have discovered that no single computer is sufficient to solve the biggest computational problems that mankind has, like indexing all the content published on the web. Today even small businesses have started to run into computational problems for which single-server systems become limiting.
Unlike computers that are designed following clear guiding principles of the von Neumann architecture, designing computing clusters is still immature. Conventional view of cloud computing sees it as infrastructure-to-rent server instances or container boxes.
When it comes to distributing computation, and scaling it to thousands of CPUs in milliseconds, there is no widely accepted architecture blueprint that outlines how it can be done.
In this talk you will learn how distributed databases can serve as a solid foundation for massively-parallel and instantly-scalable distributed computing. We will also discuss leveraging NoSQL technologies, as it contributes to the architecture of a 'future computer.'
We will also cover practical matters of large-scale distributed system design that are relevant to Big data and Cloud computing fields.
2. 2
How to find relevant technology among the big data “hype”?
CLOUDCLOUD
Complexity
PAAS
AAAS
IAAS
Private
PublicHybrid
Virtualization
Containerization
Big Data
3. 3
Perhaps AI and supercomputers can solve big data problem?
AI can help, yet it requires big computing costs and big efforts today
Baidu’s AAAAIIII
SupercomputerSupercomputerSupercomputerSupercomputer
Beats Google
at Image
Recognition
MIT Technology
Review, May, 2015
Google's DeepMind Builds ArtificialArtificialArtificialArtificial
IntelligenceIntelligenceIntelligenceIntelligence That Mimics ... Human
Brain
International
Business
Times,
Nov 2014
Facebook Fights Info OverloadInfo OverloadInfo OverloadInfo Overload
WithWithWithWith AIsAIsAIsAIs That Identify What’s
In Videos
TechCrunch,
Mar 2015
Microsoft
Challenges
Google’s
Artificial BrainArtificial BrainArtificial BrainArtificial Brain
With
‘Project Adam’
Wired, July, 2014
4. 4
CATCATCATCAT or DOGDOGDOGDOG ????
Method converts vision and voice data
into text using neural network
algorithm that labels images, video or
audio with text
Deep learning of AI: the “Holly Grail” of big data computing?
If you believe authority, a cat detector for less than:
$ 1$ 1$ 1$ 1 M ?M ?M ?M ? $10 M ?$10 M ?$10 M ?$10 M ? $1 BILLION ?$1 BILLION ?$1 BILLION ?$1 BILLION ?
5. 5
“ Blind belief in authority is the greatest enemy of truth. ”
Albert Einstein
6. 6
There are 2 main problems in ordinary databases and big data
Ordinary databases overload and overwhelm users with data
7. 7
We were always have been dealing with information overload ...
Prof. Clay ShirkyProf. Clay ShirkyProf. Clay ShirkyProf. Clay Shirky,
a new media writer on
the social and economic
effects of digital
technologies, US
8. 8
Relevance ranking is a method to address information overload
Weighting of all relevant human needs to determine the best action
Relevance
ranking
Human
needs
9. 9
Relevance ranking of needs for your business product
Needs of your product customers
Needs of your business owners
Needs of your product end-users
Community needs
Employees needs
RELEVANCE
RANKING
( for example
scoring all
needs from
0% to 100% )
10. 10
How to select your cloud computing and big data technology?
Rank your most relevant human needs in big data computing!
11. 11
John von Neumann computing worldJohn von Neumann computing worldJohn von Neumann computing worldJohn von Neumann computing world
Small disk storage
Tiny RAM capacity
Slow CPU speed
Limited network bandwidth
Highly expensive hardware
Complex schemas, software, data
Most relevant need is an inexpensive computing infrastructure
Gordon MooreGordon MooreGordon MooreGordon Moore’’’’s computing worlds computing worlds computing worlds computing world
TBs of cheap hdd/ssd storage
GBs of RAM
Cheap multi-core CPUs
Gbps high-speed networks
Nearly expendable hardware
Web software (html, json, xml)
SQL
12. 12
Today relevant is instantly scalable distributed computing
CPU-time
required to
process a
request
It takes X seconds on single server
It takes 100 times less clock time
to get the result within Cloud.
14. 14
Relevant is cost-efficient sharing of computing infrastructure
PayPay--perper--useuse
Model, $Model, $
Resources
Time
ConventionalConventional
ProvisioningProvisioning
Model, $Model, $
Save 3x-10x
BIG DATABIG DATA
15. 15
Need to manage structured and unstructured data together
Easily mix / analyze all your data types
From structured data To unstructured data
XML JSON BLOBTEXTSQL NOSQL
16. 16
XML / JSON / BLOB
Relevant for human productivity is flexible schema-free data
Ordinary database Document database
17. 17
Iron-clad security and consistency for big data is very relevant
Secure high speed ACID-transactions
From a single
computer security
To safe online transaction
processing in big data
SQL
XML
JSON
BLOB
SQL
NoSQL
18. 18
Humans need fast and relevant free text search in big data
Simple web-style search in big data as a norm
Natural language
keyword (voice) search
for ease of use
Ranking of search
results to get rid of
information overload
19. 19
Instant responsivness in big data search and analytics
PB
GB
TB
MB
Milliseconds for aMilliseconds for aMilliseconds for aMilliseconds for annnn
instant searchinstant searchinstant searchinstant search queryqueryqueryquery
MinutesMinutesMinutesMinutes //// hourshourshourshours
for a SQLfor a SQLfor a SQLfor a SQL queryqueryqueryquery
Low querying latency across billions of documents is relevant
XML
JSON
BLOB
NoSQL
20. 20
replica 1
replica 2
replica 3
Relevant mision-critical features for 24/7 computing services
LOAD BALANCINGLOAD BALANCINGLOAD BALANCINGLOAD BALANCING
FAULTFAULTFAULTFAULT----TOLERANCETOLERANCETOLERANCETOLERANCE HIGHHIGHHIGHHIGH----AVAILABILITYAVAILABILITYAVAILABILITYAVAILABILITY
SCALE OUT ABILITYSCALE OUT ABILITYSCALE OUT ABILITYSCALE OUT ABILITY
23. 23
No custom integration requiredCustom “stitching” all platforms
Big data management in one software platform is very relevant
Secure DB, ACID-
transactions
ONE
API
Cut 80% off your TCOCut 80% off your TCOCut 80% off your TCOCut 80% off your TCO
Drive up to 10x fasterDrive up to 10x fasterDrive up to 10x fasterDrive up to 10x faster
Instant Big data
scale out ability
Online analytics
on rich data
Search software,
full-text indexes
Your application “spaghetti” code
24. 24
Develop your applications scalable for big data from day one
OPEX, TCOOPEX, TCOOPEX, TCOOPEX, TCO
Life-cycle
Save > 80% Write only
once your web
or mobile
software
Test Year 1 Year 2 Year 3 Year 4 Year N
25. 25
What will happen in the computing industry?
Relational databases
will die in pain
NoSQL will go
extinct as well (!?)
26. 26
Rank relevant needs to select your future big data technology
Cost-efficiency
Low latency
Instant scalability
Iron-clad security
Schema-free simplicity
High availability
Relevant text search
Real-time analytics
100%100%100%100%
0%
50%
100%100%100%100%
0%
50%
0%
50%
0%0%0%0%
50%
0%
0%
100%100%100%100%
0%0%0%0%
50%
0%
50%
100%100%100%100%
100%100%100%100%
50%
50%
100%100%100%100%
100%100%100%100%
100%100%100%100%
EnterpriseEnterpriseEnterpriseEnterprise CommunityCommunityCommunityCommunity Web Corp.Web Corp.Web Corp.Web Corp.Needs of:
28. 28
Who we are?
Clusterpoint is a
European tech
company, founded
in 2006. Our
unique database
software is used
by commercial
customers mainly
in EU & Nordic
markets.
Photo: April 2015,
AngelHack, SanFrancisco,
USA
29. 29
Founder,
Visionary
Gints
Ernestsons
CTO,
Founder
Jurgis
Orups
DB Software
Architect
Janis
Sermulins
CEO
Zigmars
Rasscevskis
Business
Dev Director
Peteris
Janovskis
Management Team
15 years CTO in
LursoftLursoftLursoftLursoft; 8 years
CEO in
Clusterpoint;Clusterpoint;Clusterpoint;Clusterpoint;
25 years as a
technology
entrepreneur &
bold innovator
8 years in
GoogleGoogleGoogleGoogle;
Technical Lead,
the Web search
infrastructure
core software
engineering
(Zurich, Swiss)
9 years runs
ClusterpointClusterpointClusterpointClusterpoint
core software
engineering
team, expert
in C/C++,
NoSQL, Big
data search
5 years in
GoogleGoogleGoogleGoogle
(Zurich);
MITMITMITMIT alumni;
internship in
IntelIntelIntelIntel RRRResearchesearchesearchesearch
(USA)
12 years in
OracleOracleOracleOracle;
Alliance &
Channel
Director
Central and
East
Europe
30. 30
Try instantly relevant computing with Clusterpoint database!
Cost-efficient distributed document database with built-in search and analytics
XML
JSON
BLOB
REST APIREST APIREST APIREST API
ALL IN ONE PLATFORMALL IN ONE PLATFORMALL IN ONE PLATFORMALL IN ONE PLATFORM
31. 31
Email: support@clusterpoint.com
Phone USA: +1 (650) 681 9710
Phone Europe: +371 (2) 9243460
Scale your data when your need is the most relevant: instantly
Free 10 GBFree 10 GBFree 10 GBFree 10 GB • instant scalability
• no s/w deployment
• no h/w provisioning
• 365/24/7 managed
Free sign-up: www.clusterpoint.com
Cloud DBaaSCloud DBaaSCloud DBaaSCloud DBaaS
32. 32
" For the modern customer, big data is all about big relevance “
Source: Prof. Steven Van Belleghem, a writer, keynote speaker and inspirator,
a thought leader in customer-centric marketing. His book published in 2015.
Thank you for your attention!