1
NOSQL
BIGDATA
BLOCKCHAIN
ТРИ КИТА СОВРЕМЕННЫХ ДАННЫХ
SEPTEMBER 24, 2017
2
• 8+ years in love with Java
• Roles: SE, TL, RM, DM
• Blockchain
– Researcher
– Educate my trade bot
– Working on Hyperledger Fabric solution
HELLO. MY NAME IS VLADIMIR
3
EPAM BLOCKCHAIN LAB
CURRENT STATE
• EPAM Blockchain Competency Center created in 2015
• Investigate frameworks and build knowledgebase
• Identify and qualify use cases and architecture solutions in a number of industries
• Monitor the market and communities: conferences, hackathons, discussion forums
• Develop solutions for our clients and for internal research needs
SOLUTIONS CREATED
• OTC Trade Capture and Matching (Ethereum)
• Digital Assets Management Platform (Ethereum + Amazon S3 + sMPC)
• Digital Assets Management Platform + Zero Knowledge
• Open Betting Platform (Ethereum)
• Digital Notary Service (NXT)
• Loyalty Network (Ethereum)
4
1. Is Blockchain a new paradigm?
2. Is Blockchain NoSQL?
3. Is Blockchain BigData?
4. How to compare Blockchain solutions?
5. How to apply Blockchain solutions?
MY QUESTIONS
5
BLOCKCHAIN IS
6
2
1
3
Not Only SQL MOTIVATION
DATA SIZE
• Web based applications with lots of data
• Vertical scalability constraints
• Price of storage
CLOUD
• SaaS, PaaS, IaaS
• Clustered environment
• Logical data decomposition
CHANGES IN
BUSINESS
• Fast adaptation for new business needs
• Flexible schema for PoC needs
• Iron Tringle for Time, Cost, Scope
7
• Not Only SQL
• Non-relational
• Key-Value: Redis, Riak, Berkley DB, Aerospike, DynamoDB
• Document Based: MongoDB, CouchDB, MarkLogic, ArangoDB
• Column Based: Cassandra, Hbase, Vertica, Accumulo, Hypertable
• Graph: Neo4J, IBM Graph, AllergoGraph
• Multi-Model: Couchbase, MarkLogic, ArangoDB
• Replication (Master-Slave, Peer-To-Peer, Combination)
• Sharding
WHAT IS NoSQL?
DEFINITION
ARCHITECTURE
DATA DISTRIBUTION
8
FEATURE / PROPERTY SQL NoSQL
Schema Design Yes Schema-less / Dynamic
Scalability Vertical scaling Horizontal scaling
Query Good to retrieve related data Good to retrieve big data
Flexibility By change log OOTB
Transactions ACID BASE
Objects Tables, Rows, Columns Collections, Documents, Fields
Storage Columns with Rows Key/Value, Document, Graph
Data Distribution Tune Sharding and Replication OOTB
Business Logic Stored Procedures Apps on top (e.g. MapReduce)
CAP CA AP/CP
SQL vs NoSQL
9
CAP THEOREM
10
• Volume – GBs, TBs, PBs
• Velocity – Fast Data processing with high performance and low latency
• Variety – concurrent processing of heterogeneous possibly not structured data sources
• Not about size but about how much useful insights can be extracted from data
• Data Lake, Lambda, Kappa
• MapReduce, Data Warehousing, Stream Processing, In-Memory Computation, Disaster Recovery
• NoSQL DBs, Hadoop, Kafka, Storm, Spark, ZooKeeper
WHAT IS BigData?
MEASUREMENT
INSIGHTS
ARCHITECTURE
TECHOLOGY
11
12
• A robust peer-to-peer network that allows transactions to be created by any party and propagated quickly to all
connected nodes
• A way to identify conflicts between transactions and resolve them automatically
• A synchronization technology that ensures all peers converge on an identical copy of the database
• A method for tagging different pieces of information as belonging to different participants, and enforcing this form of
data ownership without a central authority
• A paradigm for expressing restrictions on which operations are permitted, e.g. to prevent one company from inflating
the directory with fictitious entries
NoSQL CHALLENGES
TRANSACTIONS
CONSISTENCY
ROBUSTNESS
13
• Who controls the infrastructure when there are multiple actors involved?
• If you have multiple copies, how do you know which one is the most up-to-date?
• How do you reconcile a different system administrator role at each regional office?
• How well can you trust the data?
• If you generate the data yourself, how do you prove you were the originator? If you get data from others, how do you
know it was truly them?
• What about crashes and malicious behavior?
• How do you monetize the data?
• How do you transfer the rights of the data, or buy rights from others?
BigData CHALLENGES
CONTROL
TRUST
MONETIZATION
14
WHAT IS BLOCKCHAIN?
Blockchain is a concept of building decentralized applications to allow reliable peer to peer
transactions/interactions eliminating the need of a trusted intermediary with the help of the
following elements:
• Peer-To-Peer communication
• Transactions grouping into blocks
• Consensus protocol (resolves double spend problem and performs synchronization)
• Data Structures/Storage (resolve double spend problem, provides high availability,
reliability and immutability)
• Smart contracts (executes business logic on event/transaction)
• Crypto primitives
• Distributed Ledger Technology
Blockchain enables: Trusted efficient collaboration between several parties with no need for
intermediary
15
NEW PARADIGM?
DATA STRUCTURES
Hash pointers (1960s)
Merkle tree (1979)
Distributed hash table
DIGITAL SIGNATURE
Symmetric and asymmetric
cryptography (1970s)
CONSENSUS PROTOCOL
Proof-of-Work (1990s)
Proof-of-Stake
Others….
https://www.weusecoins.com/what-is-a-merkle-tree/
17
FEATURE / PROPERTY BLOCKCHAIN DISTRIBUTED DATABASE
Operations Append only Insert, Update, Delete
Redundancy Yes Yes
High-availability Yes Yes
Data Sharding Planned Yes
Replication Yes, peer-to-peer Yes
Consensus Block Row, Replica Set
Signatures Always Manual
Data Validation Always Manual
Business Logic Smart Contracts Triggers, Stored Procedures, etc.
Primary usage Ledger Generic
BLOCKCHAIN vs DISTRIBUTED DATABASE
18
ARCHITECTURALLY SIGNIFICANT REQUIREMENTS
PERFORMANCE (e.g. Bitcoin: 3 tps, Ethereum: 25 tps)
LATENCY (e.g. Bitcoin: 10 mins, Ethereum: 12 sec)
SCALABILITY (Blockchain size, Smart Contracts executed on every node)
SECURITY (Smart contract security, Consensus protocol security)
PRIVACY (All participant nodes see all transactions)
WASTE OF RESOURCES (e.g. Bitcoin PoW: 6300 nodes, 24 hours, 100% CPU load)
MATURITY (e.g. Ethereum: every other month have to change fork, breaking changes)
19
SIZE DOES MATTER
HOW BIG IS BITCOIN BLOCKCHAIN?
20
• 1.5 TB – 2021
• 3 TB – 2022
• 6 TB – 2023
• 1.5 PB - 2031
FORECAST
• No Sharding
• Consistency Time
• Storage
PROBLEM
21
1 TB – beginning of 2018
FORECAST
• Sharding TBD
• Parallel Processing
• Byzantine
SOLUTION
22
THROUGHPUT. BITCOIN
23
THROUGHPUT. ETHERIUM
24
THROUGHPUT. HYPERLEDGER
25
LATENCY. BITCOIN
26
LATENCY. ETHERIUM
27
PRIVACY. NETWORK TYPES
This is a closed system
checking all details
and controlling access
via invitation
All open network that
anybody can access like
the Bitcoin model
Permissioned Permissionless
28
PRIVACY. ENCRYPTION
RING SIGNATURES HOMOMORPHIC
ENCRYPTION
(HE)
SECURE MULTI-PARTY
COMPUTATION (MPC)
ZERO KNOWLEDGE
PROOFS (ZK)
29
PRIVACY. BY PERMISSIONED NATURE
30
FRAMEWORK NETWORK CONSENSUS PERFORMANCE LATENCY SECURITY PRIVACY AVAILABILITY SCALABILITY MATURITY
Bitcoin Public PoW
avg 3 tps
max 7 tps
avg 50 min manual
new
addresses
medium
Blocksize
change
2009
Ethereum Public
PoW, PoS,
PoA
25 tps PoW
300 tps PoA
14s PoW
2s PoA
manual ZK TBD
lot of
monitoring
Sharding
TBD
2015
Graphene Public
delegated
PoS
1000 tps 2s manual no 2014
Zcash Public ZK + PoW avg 15 tps avg 2.5 min ZK
Blocksize
change
2016
Monero Public
CryptoNight
PoW
dynamic Ring Sign 2014
Hyperledger Private
Kafka,
pluggable
avg 300 tps
on channel
1s
Membership
Service
Provider
Channels
HA by
ZooKeeper
Dynamic
and
Program
2017
R3 Corda Private
pluggable,
RAFT
exp 1200
tps
1s customize customize 2016
BigchainDB Private
underlying
DB, Raft
1000 blocks
* 1000 txn
630ms in
one region
Roles and
Permissions
Hidden
values
n/a
Clustered
env
2016
PLAYERS AND FRAMEWORKS
31
BigData HELPS
BaaS
• IBM Bluemix
• Microsoft Azure
• Rubix
NoSQL
• CouchDB for Hyperledger
• DAG for IOTA, Byteball
• BigchainDB
Elasticsearch
• Full-text search
• REST API
LOREM IPSUM DOLOR AMET
• Nulla nu nisi
• Risus purus id fusce
• Lobortis ipsum felis sed
Apache Kafka
• Streaming and Messaging system
• Consensus in Fabric
• Distributed and Scalable
ZooKeeper
• Maintain nodes
• Provide HA
• Quorum
MapReduce
• Plasma.io
• Scalable Computation
• Blockchain Indexing
32
1 2 3 4
BIG USE CASES
GOVERNMENT
• Trust Voting
• Public and private
assets notary records
• Identity Management
• Intellectual property
management
• Digital Assets
distribution
ENTERTAINMENT
• Trust Casino/Lottery
• Gaming
• Media Distribution
Network
• Crowdfunding
• Prediction Markets
ECOMMERCE
• Supply Chain
• Goods Tracking (guns,
devices, chemical)
• Loyalty Programs
• KYC/AML
• Gray Markets
IT
• Distributed
Supercomputer
• Decentralized
Storage/Hosting
• IoT
• Cybersecurity
• Data Verification
33
5 6 7 8
BIG USE CASES
LIFESCIENCE
• Donors Chain
• Secure Patient Data
Platform
• Medicaments lifecycle
EDUCATION
• Education Records
• Certifications
• Research Works and
Papers
B2C
• Car Sharing
• Lending
• Wine/Food Tracking
• Money Transfers
FUTURE
• AR/VR Internal Assets
Management
• Clone Prevention
• Distributed Energy
• Time Tracking and
Employee as a Service
(HR, TA, SE etc.)
• Insurance (AntiFraud
and Fast Payments)
34
1. Is Blockchain a new paradigm? - No
2. Is Blockchain NoSQL? – At most by nature
3. Is Blockchain BigData? – At least depends on cases
4. How to compare Blockchain solutions? – Use ASR
5. How to apply Blockchain solutions? – Wisely with imagination
SUMMARY
35
CONCLUSION
If you an engineer – blockchain is so much interesting and modern1
If you an investor – trust to engineers2
If you an observer – choose the right side3
36
THANK YOU
vladimir_bichev@epam.com
vladimir.bichev
@volod_i_mir
37
Appendix: NoSQL Family
38
Appendix: NoSQL Cheatsheet
39
Appendix: BLOCKTECH LANDSCAPE
40
Appendix: INVESTORS AND STARTUPS MAP
41
Appendix: Financial Services Firms Into Blockchain

Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков Санкт-Петербурга #spblockchain

  • 1.
  • 2.
    2 • 8+ yearsin love with Java • Roles: SE, TL, RM, DM • Blockchain – Researcher – Educate my trade bot – Working on Hyperledger Fabric solution HELLO. MY NAME IS VLADIMIR
  • 3.
    3 EPAM BLOCKCHAIN LAB CURRENTSTATE • EPAM Blockchain Competency Center created in 2015 • Investigate frameworks and build knowledgebase • Identify and qualify use cases and architecture solutions in a number of industries • Monitor the market and communities: conferences, hackathons, discussion forums • Develop solutions for our clients and for internal research needs SOLUTIONS CREATED • OTC Trade Capture and Matching (Ethereum) • Digital Assets Management Platform (Ethereum + Amazon S3 + sMPC) • Digital Assets Management Platform + Zero Knowledge • Open Betting Platform (Ethereum) • Digital Notary Service (NXT) • Loyalty Network (Ethereum)
  • 4.
    4 1. Is Blockchaina new paradigm? 2. Is Blockchain NoSQL? 3. Is Blockchain BigData? 4. How to compare Blockchain solutions? 5. How to apply Blockchain solutions? MY QUESTIONS
  • 5.
  • 6.
    6 2 1 3 Not Only SQLMOTIVATION DATA SIZE • Web based applications with lots of data • Vertical scalability constraints • Price of storage CLOUD • SaaS, PaaS, IaaS • Clustered environment • Logical data decomposition CHANGES IN BUSINESS • Fast adaptation for new business needs • Flexible schema for PoC needs • Iron Tringle for Time, Cost, Scope
  • 7.
    7 • Not OnlySQL • Non-relational • Key-Value: Redis, Riak, Berkley DB, Aerospike, DynamoDB • Document Based: MongoDB, CouchDB, MarkLogic, ArangoDB • Column Based: Cassandra, Hbase, Vertica, Accumulo, Hypertable • Graph: Neo4J, IBM Graph, AllergoGraph • Multi-Model: Couchbase, MarkLogic, ArangoDB • Replication (Master-Slave, Peer-To-Peer, Combination) • Sharding WHAT IS NoSQL? DEFINITION ARCHITECTURE DATA DISTRIBUTION
  • 8.
    8 FEATURE / PROPERTYSQL NoSQL Schema Design Yes Schema-less / Dynamic Scalability Vertical scaling Horizontal scaling Query Good to retrieve related data Good to retrieve big data Flexibility By change log OOTB Transactions ACID BASE Objects Tables, Rows, Columns Collections, Documents, Fields Storage Columns with Rows Key/Value, Document, Graph Data Distribution Tune Sharding and Replication OOTB Business Logic Stored Procedures Apps on top (e.g. MapReduce) CAP CA AP/CP SQL vs NoSQL
  • 9.
  • 10.
    10 • Volume –GBs, TBs, PBs • Velocity – Fast Data processing with high performance and low latency • Variety – concurrent processing of heterogeneous possibly not structured data sources • Not about size but about how much useful insights can be extracted from data • Data Lake, Lambda, Kappa • MapReduce, Data Warehousing, Stream Processing, In-Memory Computation, Disaster Recovery • NoSQL DBs, Hadoop, Kafka, Storm, Spark, ZooKeeper WHAT IS BigData? MEASUREMENT INSIGHTS ARCHITECTURE TECHOLOGY
  • 11.
  • 12.
    12 • A robustpeer-to-peer network that allows transactions to be created by any party and propagated quickly to all connected nodes • A way to identify conflicts between transactions and resolve them automatically • A synchronization technology that ensures all peers converge on an identical copy of the database • A method for tagging different pieces of information as belonging to different participants, and enforcing this form of data ownership without a central authority • A paradigm for expressing restrictions on which operations are permitted, e.g. to prevent one company from inflating the directory with fictitious entries NoSQL CHALLENGES TRANSACTIONS CONSISTENCY ROBUSTNESS
  • 13.
    13 • Who controlsthe infrastructure when there are multiple actors involved? • If you have multiple copies, how do you know which one is the most up-to-date? • How do you reconcile a different system administrator role at each regional office? • How well can you trust the data? • If you generate the data yourself, how do you prove you were the originator? If you get data from others, how do you know it was truly them? • What about crashes and malicious behavior? • How do you monetize the data? • How do you transfer the rights of the data, or buy rights from others? BigData CHALLENGES CONTROL TRUST MONETIZATION
  • 14.
    14 WHAT IS BLOCKCHAIN? Blockchainis a concept of building decentralized applications to allow reliable peer to peer transactions/interactions eliminating the need of a trusted intermediary with the help of the following elements: • Peer-To-Peer communication • Transactions grouping into blocks • Consensus protocol (resolves double spend problem and performs synchronization) • Data Structures/Storage (resolve double spend problem, provides high availability, reliability and immutability) • Smart contracts (executes business logic on event/transaction) • Crypto primitives • Distributed Ledger Technology Blockchain enables: Trusted efficient collaboration between several parties with no need for intermediary
  • 15.
    15 NEW PARADIGM? DATA STRUCTURES Hashpointers (1960s) Merkle tree (1979) Distributed hash table DIGITAL SIGNATURE Symmetric and asymmetric cryptography (1970s) CONSENSUS PROTOCOL Proof-of-Work (1990s) Proof-of-Stake Others…. https://www.weusecoins.com/what-is-a-merkle-tree/
  • 16.
    17 FEATURE / PROPERTYBLOCKCHAIN DISTRIBUTED DATABASE Operations Append only Insert, Update, Delete Redundancy Yes Yes High-availability Yes Yes Data Sharding Planned Yes Replication Yes, peer-to-peer Yes Consensus Block Row, Replica Set Signatures Always Manual Data Validation Always Manual Business Logic Smart Contracts Triggers, Stored Procedures, etc. Primary usage Ledger Generic BLOCKCHAIN vs DISTRIBUTED DATABASE
  • 17.
    18 ARCHITECTURALLY SIGNIFICANT REQUIREMENTS PERFORMANCE(e.g. Bitcoin: 3 tps, Ethereum: 25 tps) LATENCY (e.g. Bitcoin: 10 mins, Ethereum: 12 sec) SCALABILITY (Blockchain size, Smart Contracts executed on every node) SECURITY (Smart contract security, Consensus protocol security) PRIVACY (All participant nodes see all transactions) WASTE OF RESOURCES (e.g. Bitcoin PoW: 6300 nodes, 24 hours, 100% CPU load) MATURITY (e.g. Ethereum: every other month have to change fork, breaking changes)
  • 18.
    19 SIZE DOES MATTER HOWBIG IS BITCOIN BLOCKCHAIN?
  • 19.
    20 • 1.5 TB– 2021 • 3 TB – 2022 • 6 TB – 2023 • 1.5 PB - 2031 FORECAST • No Sharding • Consistency Time • Storage PROBLEM
  • 20.
    21 1 TB –beginning of 2018 FORECAST • Sharding TBD • Parallel Processing • Byzantine SOLUTION
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    27 PRIVACY. NETWORK TYPES Thisis a closed system checking all details and controlling access via invitation All open network that anybody can access like the Bitcoin model Permissioned Permissionless
  • 27.
    28 PRIVACY. ENCRYPTION RING SIGNATURESHOMOMORPHIC ENCRYPTION (HE) SECURE MULTI-PARTY COMPUTATION (MPC) ZERO KNOWLEDGE PROOFS (ZK)
  • 28.
  • 29.
    30 FRAMEWORK NETWORK CONSENSUSPERFORMANCE LATENCY SECURITY PRIVACY AVAILABILITY SCALABILITY MATURITY Bitcoin Public PoW avg 3 tps max 7 tps avg 50 min manual new addresses medium Blocksize change 2009 Ethereum Public PoW, PoS, PoA 25 tps PoW 300 tps PoA 14s PoW 2s PoA manual ZK TBD lot of monitoring Sharding TBD 2015 Graphene Public delegated PoS 1000 tps 2s manual no 2014 Zcash Public ZK + PoW avg 15 tps avg 2.5 min ZK Blocksize change 2016 Monero Public CryptoNight PoW dynamic Ring Sign 2014 Hyperledger Private Kafka, pluggable avg 300 tps on channel 1s Membership Service Provider Channels HA by ZooKeeper Dynamic and Program 2017 R3 Corda Private pluggable, RAFT exp 1200 tps 1s customize customize 2016 BigchainDB Private underlying DB, Raft 1000 blocks * 1000 txn 630ms in one region Roles and Permissions Hidden values n/a Clustered env 2016 PLAYERS AND FRAMEWORKS
  • 30.
    31 BigData HELPS BaaS • IBMBluemix • Microsoft Azure • Rubix NoSQL • CouchDB for Hyperledger • DAG for IOTA, Byteball • BigchainDB Elasticsearch • Full-text search • REST API LOREM IPSUM DOLOR AMET • Nulla nu nisi • Risus purus id fusce • Lobortis ipsum felis sed Apache Kafka • Streaming and Messaging system • Consensus in Fabric • Distributed and Scalable ZooKeeper • Maintain nodes • Provide HA • Quorum MapReduce • Plasma.io • Scalable Computation • Blockchain Indexing
  • 31.
    32 1 2 34 BIG USE CASES GOVERNMENT • Trust Voting • Public and private assets notary records • Identity Management • Intellectual property management • Digital Assets distribution ENTERTAINMENT • Trust Casino/Lottery • Gaming • Media Distribution Network • Crowdfunding • Prediction Markets ECOMMERCE • Supply Chain • Goods Tracking (guns, devices, chemical) • Loyalty Programs • KYC/AML • Gray Markets IT • Distributed Supercomputer • Decentralized Storage/Hosting • IoT • Cybersecurity • Data Verification
  • 32.
    33 5 6 78 BIG USE CASES LIFESCIENCE • Donors Chain • Secure Patient Data Platform • Medicaments lifecycle EDUCATION • Education Records • Certifications • Research Works and Papers B2C • Car Sharing • Lending • Wine/Food Tracking • Money Transfers FUTURE • AR/VR Internal Assets Management • Clone Prevention • Distributed Energy • Time Tracking and Employee as a Service (HR, TA, SE etc.) • Insurance (AntiFraud and Fast Payments)
  • 33.
    34 1. Is Blockchaina new paradigm? - No 2. Is Blockchain NoSQL? – At most by nature 3. Is Blockchain BigData? – At least depends on cases 4. How to compare Blockchain solutions? – Use ASR 5. How to apply Blockchain solutions? – Wisely with imagination SUMMARY
  • 34.
    35 CONCLUSION If you anengineer – blockchain is so much interesting and modern1 If you an investor – trust to engineers2 If you an observer – choose the right side3
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    41 Appendix: Financial ServicesFirms Into Blockchain