SlideShare a Scribd company logo
Roman Nikitchenko, 04.12.2014
Any real big data is 
just about 
DIGITAL LIFE 
FOOTPRINT 
www.vitech.com.ua 2
NOT ALL 
THINGS IN 
OUR LIFE 
ARE NICE 
THE 
SAME IS 
ABOUT 
... 
www.vitech.com.ua 3
BIG DATA is not about the 
data. It is about OUR ABILITY 
TO HANDLE THEM. 
www.vitech.com.ua 4
YARN 
www.vitech.com.ua 5
Basics 
Most dangerous 
things in Big Data 
Beware! 
Don't shoot your 
own foot with BIG 
GUN! 
Couple of 
specific notes 
Some aspects are 
more special. 
www.vitech.com.ua 6
MOST SERIOUS BIG DATA failure IS ... 
NO DATA 
www.vitech.com.ua 7
NO DATA 
The biggest mistake in BIG 
DATA strategy is to limit 
amount of data you collect. 
NO MONEY 
www.vitech.com.ua 8
WHERE 
ARE 
YOU? 
www.vitech.com.ua 9
DATA LAKE 
Take as much data 
about your business 
processes as you can 
take. The more data 
you have the more 
value you could get 
from it. 
www.vitech.com.ua 10
YOU 
ALWAYS 
HAVE 
OPTION 
● We have developed 
our own online 
storage which lowers 
maintenance and 
stores anything. 
www.vitech.com.ua 11
Most serious errors in Big 
Data are about operations 
and infrastructure. Not about 
algorithms, or code. 
LIVE WITH IT 
www.vitech.com.ua 12
YOU 
ALWAYS 
HAVE 
OPTION 
● We have special 
engineering roadmap for 
big data infrastructure 
development. 
www.vitech.com.ua 13
Use robust solutions 
Why hadoop? 
BIG 
DATA BIG 
= 
+ 
x MAX 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
BIG 
DATA 
www.vitech.com.ua 14
What is 
HADOOP? 
● Hadoop is open source 
framework for big 
data. Both distributed 
storage and 
processing. 
● Hadoop is reliable and 
fault tolerant with no 
rely on hardware for 
these properties. 
● Hadoop has unique 
horisontal scalability. 
Currently — from 
single computer up to 
thousands of cluster 
nodes. 
www.vitech.com.ua 15
Hadoop: don't do it yourself 
www.vitech.com.ua 16
Option? Our experience is: 
● HortonWorks are 'barely open source'. Innovative, but 
'running too fast'. Most ot their key technologies are not 
so mature yet. Some people LOVE them. 
Cloudera is stable enough but not stale. Hadoop 2.5 with 
YARN, HBase 0.98.x, Spark 1.x. Balance as for late 2014. 
● MapR focuses on performance per node but they are 
slightly outdated in term of functionality and their 
distribution costs. For cases where node performance is 
high priority. 
www.vitech.com.ua 17
HBase motivation 
Hadoop is... 
● Designed for throughput, 
not for latency. 
● HDFS blocks are expected 
to be large. There is issue 
with lot of small files. 
● Write once, read many 
times ideology. 
● MapReduce is not so 
flexible so any database 
built on top of it. 
● How about realtime? 
www.vitech.com.ua 18
Uses commodity 
hardware... 
'Commodity' word 
understanding is growing 
● 64G RAM is considered pretty small 
amount. 128G is more and more often 
configuration. 
● 2xCPU with 6 cores each is considered 
commodity. 
● 4xHDD is a minimum. SSD are used 
more and more often. 
www.vitech.com.ua 19
VIRTUALIZATION 
Virtualization 
NOT 
SO 
REAL 
ELEPHANT 
www.vitech.com.ua 20
CONCERNS 
● Is possible for key nodes. Not for 
workers unless you are really big. 
● Several nodes on single physical 
host: what happens if this host fail? 
● Loaded services on VM: is it 
meaningful? Double duties? 
www.vitech.com.ua 21
REAL EXAMPLE 
Virtualization: practical case 
● Apache ZooKeeper is 
QUORUM based service. 
● If host with 2 ZK fails, 
Everything fail which 
breaks tolerancy to 1 
failure. 
● Can you garantee equal 
performance for ZK 
service instances? 
● DON'T PUT QUORUM 
SERVICES IN VIRTUAL 
ENVIRONMENT! 
HOST 
HOST 
www.vitech.com.ua 22
YOU 
ALWAYS 
HAVE 
OPTION 
● Indeed there is lot of 
options with 
virtualization. The only 
concern is about ability 
to use your own brains. 
www.vitech.com.ua 23
Need online HBase motivation 
storage for 
big data? 
LATENCY, SPEED and all 
Hadoop properties. 
www.vitech.com.ua 24
NO ANY 
SECONDARY 
INDEXES OUT OF 
THE BOX. 
www.vitech.com.ua 25
YOU 
ALWAYS 
HAVE 
LOT OF 
OPTIONS 
● We have buit our 
search indexing 
technology. 
www.vitech.com.ua 26
INDEX ALTERNATIVE: SOLR 
INDEX UPDATE 
INDEX QUERY 
Search responses 
Index update request is 
analyzed, tokenized, 
transformed... and the 
same is for queries. 
● SOLR indexes documents. What is stored into 
SOLR index is not what you index. SOLR is NOT A 
STORAGE, ONLY INDEX 
● But it can index ANYTHING. Search result is 
document ID 
www.vitech.com.ua 27
● HBase handles user data change online 
requests. 
● NGData Lily indexer handles stream of changes 
and transforms them into SOLR index change 
requests. 
● Indexes are built on SOLR so HBase data are 
searchable. 
www.vitech.com.ua 28
HBase: Data and search integration 
Replication can be 
set up to column 
HBase regions 
HDFS 
Data update 
www.vitech.com.ua 29 
Client 
User just puts (or 
deletes) data. 
Search responses 
Lily HBase 
NRT indexer 
family level. 
REPLICATION 
HBase 
cluster 
Translates data 
changes into SOLR 
index updates. 
SOLR cloud 
Search requests (HTTP) 
Apache 
Zookeeper does 
all coordination 
Finally provides 
search 
Serves low level 
file system.
ETL 
LOAD 
YOUR 
DATA 
WITH CARE 
ETL 
www.vitech.com.ua 30
ENTERPRISE DATA HUB 
Don't ruine your existing data warehouse. 
Just extend it with new, centralized big 
data storage through data migration 
solution. 
www.vitech.com.ua 31
ETL & BD: main stages 
SQL 
server 
EXTRACT TRANSFORM LOAD 
Table1 
BIG DATA shard 
Table2 
JOIN Transform 
Partition 
Table3 
BIG DATA shard 
Table4 BIG DATA shard 
● SQL solution are usually not so 
distributed as Big Data one. How to 
partition your data? 
● Big data storages are mostly non-relational. 
You are to map table 
relations into objects. Where to put this 
complexity? 
www.vitech.com.ua 32
ETL & BD: complexity on SQL 
SQL 
server 
Table1 
BIG DATA shard 
Table2 
JOIN 
ETL stream 
Table3 
BIG DATA shard 
Table4 BIG DATA shard 
● It's hard to transform SQL relationship 
into NoSQL objects: complex joins. 
● Simple stream on big data, lowered 
network traffic. HUGE load on SQL. 
● What if you have several SQL servers 
and you need 2 times faster import? 
SQL 
dies on 
this 
www.vitech.com.ua 33
ETL & BD: complexity on BD side 
SQL 
server 
Table1 
ETL stream BIG DATA shard 
Table2 
ETL stream 
JOIN 
Table3 
BIG DATA shard 
ETL stream 
Table4 ETL stream 
BIG DATA shard 
● Simple streaming from SQL. Things 
like joins on Big Data side. 
● Even if you have 100 SQL servers, 
you have to scale single cluster. 
● Network load is more intensive. 
Much 
more 
scalable 
www.vitech.com.ua 34
YARN: future of Hadoop 
● YARN forms resource management layer and completes 
real distributed data OS so heterogeneous clusters and 
multi-tenancy are real things. 
● New distributed processing approaches: MapReduce is 
from now only one among other YARN appliactions. 
www.vitech.com.ua 35
First ever world 
DATA OS 
10.000 nodes computer... 
Recent technology changes are focused on 
higher scale. Better resource usage and 
control, lower MTTR, higher security, 
redundancy, fault tolerance. 
www.vitech.com.ua 36
YARN 
This is how retail 
agents often 
work. 
www.vitech.com.ua 37
YARN What can be reality 
This is how it 
often works. 
CPU 
CPU CPU CPU 
YARN presents 
CPU CPU CPU CPU 
it's about 
reservation. Indeed 
you could have no 
resource because of 
service not aware 
of YARN. 
www.vitech.com.ua 38
YOU 
ALWAYS 
HAVE 
OPTION 
www.vitech.com.ua 39
Apache 
Spark 
● Better MapReduce with at least some 
MapReduce elements able to be reused. 
● New job models. Not only Map and Reduce. 
● Scala and Python API in addition to Java. 
Functional model support. 
● Results can be passed through memory 
including final one. 
www.vitech.com.ua 40
● Works much better if knows about size of job to 
do. Streaming is just sequence of small jobs. 
● Requires proper YARN tuning to use resources 
properly. No dynamic allocation of executors. 
● Persistance: int limitation with 2G. HUGE 
amount of memory as for today. 
● You cannot partition data 'on the fly'. Should 
guess right way. 
www.vitech.com.ua 41
Your cluster is ready for 
Map-reduce Spark next tasks 
YARN 
● Dynamic, faster to startup, 
resources reusage. 
● Unified management 
infrastructure such as 
logging. 
+ 
www.vitech.com.ua 42
It is simply too 
good to wait... 
www.vitech.com.ua 43
TRUST ME ;-) 
www.vitech.com.ua 44
Share your knowledge! 
DO NOT 
HIDE YOUR 
EXPERIENCE 
www.vitech.com.ua 45
Questions and discussion 
www.vitech.com.ua 46

More Related Content

What's hot

DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
grepalex
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Mark Rittman
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
Todd Hoff
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
Asim Jalis
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
Big Data Joe™ Rossi
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
Michelle Darling
 
ODI12c as your Big Data Integration Hub
ODI12c as your Big Data Integration HubODI12c as your Big Data Integration Hub
ODI12c as your Big Data Integration Hub
Mark Rittman
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
Maturin BADO
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
adunne
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
OSCON Byrum
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
Anant Corporation
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
ScyllaDB
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
MartinStrycek
 
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
Severalnines
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
Anant Corporation
 

What's hot (20)

DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Data Engineering Quick Guide
Data Engineering Quick GuideData Engineering Quick Guide
Data Engineering Quick Guide
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
ODI12c as your Big Data Integration Hub
ODI12c as your Big Data Integration HubODI12c as your Big Data Integration Hub
ODI12c as your Big Data Integration Hub
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...
 
Cassandra Lunch #88: Cadence
Cassandra Lunch #88: CadenceCassandra Lunch #88: Cadence
Cassandra Lunch #88: Cadence
 

Similar to Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.

Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!
Roman Nikitchenko
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
Roman Nikitchenko
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymore
Stfalcon Meetups
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
Roman Nikitchenko
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
DataWorks Summit
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
solarisyougood
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Mark Rittman
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
Roman Nikitchenko
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
Roman Nikitchenko
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
ScyllaDB
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknesses
Linuxaria.com
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
exponential-inc
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
jhugg
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 

Similar to Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls. (20)

Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!
 
Big data: current technology scope.
Big data: current technology scope.Big data: current technology scope.
Big data: current technology scope.
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymoreBig data & frameworks: no book for you anymore
Big data & frameworks: no book for you anymore
 
Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.Big data & frameworks: no book for you anymore.
Big data & frameworks: no book for you anymore.
 
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
 
Running Spark and MapReduce together in Production
Running Spark and MapReduce together in ProductionRunning Spark and MapReduce together in Production
Running Spark and MapReduce together in Production
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
 
BIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephantBIG DATA: From mammoth to elephant
BIG DATA: From mammoth to elephant
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknesses
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 

More from GeeksLab Odessa

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
GeeksLab Odessa
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
GeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
GeeksLab Odessa
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
GeeksLab Odessa
 

More from GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 

Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.

  • 2. Any real big data is just about DIGITAL LIFE FOOTPRINT www.vitech.com.ua 2
  • 3. NOT ALL THINGS IN OUR LIFE ARE NICE THE SAME IS ABOUT ... www.vitech.com.ua 3
  • 4. BIG DATA is not about the data. It is about OUR ABILITY TO HANDLE THEM. www.vitech.com.ua 4
  • 6. Basics Most dangerous things in Big Data Beware! Don't shoot your own foot with BIG GUN! Couple of specific notes Some aspects are more special. www.vitech.com.ua 6
  • 7. MOST SERIOUS BIG DATA failure IS ... NO DATA www.vitech.com.ua 7
  • 8. NO DATA The biggest mistake in BIG DATA strategy is to limit amount of data you collect. NO MONEY www.vitech.com.ua 8
  • 9. WHERE ARE YOU? www.vitech.com.ua 9
  • 10. DATA LAKE Take as much data about your business processes as you can take. The more data you have the more value you could get from it. www.vitech.com.ua 10
  • 11. YOU ALWAYS HAVE OPTION ● We have developed our own online storage which lowers maintenance and stores anything. www.vitech.com.ua 11
  • 12. Most serious errors in Big Data are about operations and infrastructure. Not about algorithms, or code. LIVE WITH IT www.vitech.com.ua 12
  • 13. YOU ALWAYS HAVE OPTION ● We have special engineering roadmap for big data infrastructure development. www.vitech.com.ua 13
  • 14. Use robust solutions Why hadoop? BIG DATA BIG = + x MAX DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA BIG DATA www.vitech.com.ua 14
  • 15. What is HADOOP? ● Hadoop is open source framework for big data. Both distributed storage and processing. ● Hadoop is reliable and fault tolerant with no rely on hardware for these properties. ● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes. www.vitech.com.ua 15
  • 16. Hadoop: don't do it yourself www.vitech.com.ua 16
  • 17. Option? Our experience is: ● HortonWorks are 'barely open source'. Innovative, but 'running too fast'. Most ot their key technologies are not so mature yet. Some people LOVE them. Cloudera is stable enough but not stale. Hadoop 2.5 with YARN, HBase 0.98.x, Spark 1.x. Balance as for late 2014. ● MapR focuses on performance per node but they are slightly outdated in term of functionality and their distribution costs. For cases where node performance is high priority. www.vitech.com.ua 17
  • 18. HBase motivation Hadoop is... ● Designed for throughput, not for latency. ● HDFS blocks are expected to be large. There is issue with lot of small files. ● Write once, read many times ideology. ● MapReduce is not so flexible so any database built on top of it. ● How about realtime? www.vitech.com.ua 18
  • 19. Uses commodity hardware... 'Commodity' word understanding is growing ● 64G RAM is considered pretty small amount. 128G is more and more often configuration. ● 2xCPU with 6 cores each is considered commodity. ● 4xHDD is a minimum. SSD are used more and more often. www.vitech.com.ua 19
  • 20. VIRTUALIZATION Virtualization NOT SO REAL ELEPHANT www.vitech.com.ua 20
  • 21. CONCERNS ● Is possible for key nodes. Not for workers unless you are really big. ● Several nodes on single physical host: what happens if this host fail? ● Loaded services on VM: is it meaningful? Double duties? www.vitech.com.ua 21
  • 22. REAL EXAMPLE Virtualization: practical case ● Apache ZooKeeper is QUORUM based service. ● If host with 2 ZK fails, Everything fail which breaks tolerancy to 1 failure. ● Can you garantee equal performance for ZK service instances? ● DON'T PUT QUORUM SERVICES IN VIRTUAL ENVIRONMENT! HOST HOST www.vitech.com.ua 22
  • 23. YOU ALWAYS HAVE OPTION ● Indeed there is lot of options with virtualization. The only concern is about ability to use your own brains. www.vitech.com.ua 23
  • 24. Need online HBase motivation storage for big data? LATENCY, SPEED and all Hadoop properties. www.vitech.com.ua 24
  • 25. NO ANY SECONDARY INDEXES OUT OF THE BOX. www.vitech.com.ua 25
  • 26. YOU ALWAYS HAVE LOT OF OPTIONS ● We have buit our search indexing technology. www.vitech.com.ua 26
  • 27. INDEX ALTERNATIVE: SOLR INDEX UPDATE INDEX QUERY Search responses Index update request is analyzed, tokenized, transformed... and the same is for queries. ● SOLR indexes documents. What is stored into SOLR index is not what you index. SOLR is NOT A STORAGE, ONLY INDEX ● But it can index ANYTHING. Search result is document ID www.vitech.com.ua 27
  • 28. ● HBase handles user data change online requests. ● NGData Lily indexer handles stream of changes and transforms them into SOLR index change requests. ● Indexes are built on SOLR so HBase data are searchable. www.vitech.com.ua 28
  • 29. HBase: Data and search integration Replication can be set up to column HBase regions HDFS Data update www.vitech.com.ua 29 Client User just puts (or deletes) data. Search responses Lily HBase NRT indexer family level. REPLICATION HBase cluster Translates data changes into SOLR index updates. SOLR cloud Search requests (HTTP) Apache Zookeeper does all coordination Finally provides search Serves low level file system.
  • 30. ETL LOAD YOUR DATA WITH CARE ETL www.vitech.com.ua 30
  • 31. ENTERPRISE DATA HUB Don't ruine your existing data warehouse. Just extend it with new, centralized big data storage through data migration solution. www.vitech.com.ua 31
  • 32. ETL & BD: main stages SQL server EXTRACT TRANSFORM LOAD Table1 BIG DATA shard Table2 JOIN Transform Partition Table3 BIG DATA shard Table4 BIG DATA shard ● SQL solution are usually not so distributed as Big Data one. How to partition your data? ● Big data storages are mostly non-relational. You are to map table relations into objects. Where to put this complexity? www.vitech.com.ua 32
  • 33. ETL & BD: complexity on SQL SQL server Table1 BIG DATA shard Table2 JOIN ETL stream Table3 BIG DATA shard Table4 BIG DATA shard ● It's hard to transform SQL relationship into NoSQL objects: complex joins. ● Simple stream on big data, lowered network traffic. HUGE load on SQL. ● What if you have several SQL servers and you need 2 times faster import? SQL dies on this www.vitech.com.ua 33
  • 34. ETL & BD: complexity on BD side SQL server Table1 ETL stream BIG DATA shard Table2 ETL stream JOIN Table3 BIG DATA shard ETL stream Table4 ETL stream BIG DATA shard ● Simple streaming from SQL. Things like joins on Big Data side. ● Even if you have 100 SQL servers, you have to scale single cluster. ● Network load is more intensive. Much more scalable www.vitech.com.ua 34
  • 35. YARN: future of Hadoop ● YARN forms resource management layer and completes real distributed data OS so heterogeneous clusters and multi-tenancy are real things. ● New distributed processing approaches: MapReduce is from now only one among other YARN appliactions. www.vitech.com.ua 35
  • 36. First ever world DATA OS 10.000 nodes computer... Recent technology changes are focused on higher scale. Better resource usage and control, lower MTTR, higher security, redundancy, fault tolerance. www.vitech.com.ua 36
  • 37. YARN This is how retail agents often work. www.vitech.com.ua 37
  • 38. YARN What can be reality This is how it often works. CPU CPU CPU CPU YARN presents CPU CPU CPU CPU it's about reservation. Indeed you could have no resource because of service not aware of YARN. www.vitech.com.ua 38
  • 39. YOU ALWAYS HAVE OPTION www.vitech.com.ua 39
  • 40. Apache Spark ● Better MapReduce with at least some MapReduce elements able to be reused. ● New job models. Not only Map and Reduce. ● Scala and Python API in addition to Java. Functional model support. ● Results can be passed through memory including final one. www.vitech.com.ua 40
  • 41. ● Works much better if knows about size of job to do. Streaming is just sequence of small jobs. ● Requires proper YARN tuning to use resources properly. No dynamic allocation of executors. ● Persistance: int limitation with 2G. HUGE amount of memory as for today. ● You cannot partition data 'on the fly'. Should guess right way. www.vitech.com.ua 41
  • 42. Your cluster is ready for Map-reduce Spark next tasks YARN ● Dynamic, faster to startup, resources reusage. ● Unified management infrastructure such as logging. + www.vitech.com.ua 42
  • 43. It is simply too good to wait... www.vitech.com.ua 43
  • 44. TRUST ME ;-) www.vitech.com.ua 44
  • 45. Share your knowledge! DO NOT HIDE YOUR EXPERIENCE www.vitech.com.ua 45
  • 46. Questions and discussion www.vitech.com.ua 46