Java BigData Full Stack
Development as is ...
Alexey Zinovyev, Java Trainer in EPAM
About
With IT since 2007
With Java since 2009
With Hadoop since 2012
With EPAM since 2015
3Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
4Java Big Data Full Stack Development
The Good Old Days
5Java Big Data Full Stack Development
HRs & RMs are looking for Java developers
6Java Big Data Full Stack Development
Is Java Dream Team waiting You?
7Java Big Data Full Stack Development
Required Skills
• Advanced SQL
• Basic Linux
• Core Java & JVM
• Backend Development Experience
• Basic Computer Science Level
8Java Big Data Full Stack Development
REAL WORLD
9Java Big Data Full Stack Development
Let’s just use Javascript in frontend ONLY
10Java Big Data Full Stack Development
In frontend
ONLY?
11Java Big Data Full Stack Development
Cruel world
12Java Big Data Full Stack Development
Do you know ML JS library?
13Java Big Data Full Stack Development
Wild animals everywhere
14Java Big Data Full Stack Development
And what I tell you
15Java Big Data Full Stack Development
And what I tell you
16Java Big Data Full Stack Development
It’s Time for Java Superhero, yeah!
17Java Big Data Full Stack Development
Before patterns discovering you should ..
• Select small pieces
• Define default values for missed
data
• Remove strange signals from data
• Merge some tables in one if
required
18Java Big Data Full Stack Development
How it really works
• Share your date with us
• Our magic manipulations
• Building an answering machine
• PROFIT!!!
19Java Big Data Full Stack Development
How to start?
20Java Big Data Full Stack Development
21Java Big Data Full Stack Development
WHAT IS BIG DATA?
22Java Big Data Full Stack Development
Joke about Excel
23Java Big Data Full Stack Development
5V
24Java Big Data Full Stack Development
Every 60 seconds…
25Java Big Data Full Stack Development
From Mobile Devices
26Java Big Data Full Stack Development
From Industry
27Java Big Data Full Stack Development
We started to keep and handle stupid new things!
28Java Big Data Full Stack Development
10^6 rows
in MySQL
29Java Big Data Full Stack Development
GB->TB->PB->?
30Java Big Data Full Stack Development
Is BigData about PBs?
31Java Big Data Full Stack Development
Is BigData about PBs?
32Java Big Data Full Stack Development
It’s hard to …
• .. store
• .. handle
• .. search in
• .. visualize
• .. send in network
33Java Big Data Full Stack Development
Likes in Classmates: how to count?
34Java Big Data Full Stack Development
Crazy Zoo
2012
35Java Big Data Full Stack Development
Crazy Zoo
2016
36Java Big Data Full Stack Development
What will be
lighted this
training
37Java Big Data Full Stack Development
NOSQL
38Java Big Data Full Stack Development
What’s the problem with RBDMS’s
• Caching
• Master/Slave
• Cluster
• Table Partitioning
• Sharding
39Java Big Data Full Stack Development
Family
40Java Big Data Full Stack Development
Database
party
41Java Big Data Full Stack Development
Spring Data
42Java Big Data Full Stack Development
How to start?
43Java Big Data Full Stack Development
Java MongoDB Driver + Robomongo
44Java Big Data Full Stack Development
BIG DATA TOOL MASTER
VS
DATA SCIENTIST
45Java Big Data Full Stack Development
TRAIN
MODEL
46Java Big Data Full Stack Development
Datasets
• Facebook users, tweets
• Trade transactions
• Government
• Medicine (genomic data)
• Telecommunications
47Java Big Data Full Stack Development
Data Sources
• Relational Databases
• Data warehouses (Historical data)
• Files in CSV or in binary format
• Internet or electronic mails
• Scientific, research (R, Octave,
Matlab)
48Java Big Data Full Stack Development
Hey, man, predict something!
49Java Big Data Full Stack Development
Man or sofa?
50Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
51Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
52Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
53Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
• Can you recommend music for users?
54Java Big Data Full Stack Development
Green circle is blue square or red
triangle? Let’s ask its neighbors!
kNN (k-nearest neighbor)
55Java Big Data Full Stack Development
Collaborative Filtering
56Java Big Data Full Stack Development
Machine Learning vs Traditional Programming
57Java Big Data Full Stack Development
Data
Science
58Java Big Data Full Stack Development
Can a Java programmer to be a Data Scientist?
59Java Big Data Full Stack Development
Sexy Data Scientist
60Java Big Data Full Stack Development
Real Data Scientist
61Java Big Data Full Stack Development
How to start?
62Java Big Data Full Stack Development
Weka
63Java Big Data Full Stack Development
HADOOP
64Java Big Data Full Stack Development
Hadoop and Data Knights
65Java Big Data Full Stack Development
Hadoop
66Java Big Data Full Stack Development
MapReduce in different languages
67Java Big Data Full Stack Development
MapReduce for WordCount
68Java Big Data Full Stack Development
Hadoop
Jobs
69Java Big Data Full Stack Development
Hadoop frameworks
• Universal (MapReduce, Tez, RDD in Spark)
• Abstract (Pig, Pipeline Spark)
• SQL - like (Hive, Impala, Spark SQL)
• Processing graph (Giraph, GraphX)
• Machine Learning (Mahout, MLib)
• Stream processing (Spark Streaming, Storm)
70Java Big Data Full Stack Development
SPARK
71Java Big Data Full Stack Development
SPARK: the bloody son of MR
• MapReduce in memory
• Up to 50x faster than Hadoop
• RDD is a basic building block
(immutable distributed
collections of objects)
• Pipeline API (no needs in PIG)
72Java Big Data Full Stack Development
Spark
Family
73Java Big Data Full Stack Development
MLlib supports
• Classification and regression
• Collaborative filtering
• Clustering
• Dimensionality reduction
• Optimization
74Java Big Data Full Stack Development
Code sample MLlib (K-Means)
// Cluster the data into two classes using KMeans
int numClusters = 2;
int numIterations = 20;
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
// Evaluate clustering by computing Within Set Sum of Squared Errors
double WSSSE = clusters.computeCost(parsedData.rdd());
System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
// Save and load model
clusters.save(sc.sc(), "myModelPath");
KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
75Java Big Data Full Stack Development
MLlib
• .. extends scikit-learn (Python lib) and Mahout
• .. runs fully on Spark and supports Spark’s Pipeline API
• .. dataset is represented by Spark SQL’s SchemaRDD
• .. supports Hive like external data source
• .. is well for large datasets and parallelized algorithms
76Java Big Data Full Stack Development
It solves all problems!
77Java Big Data Full Stack Development
How to start?
78Java Big Data Full Stack Development
HDP Zoo
79Java Big Data Full Stack Development
Ok, Google!
80Java Big Data Full Stack Development
AWS Amazon
81Java Big Data Full Stack Development
Infrastructure issues are waiting YOU!
82Java Big Data Full Stack Development
DEEP LEARNING
83Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
84Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
85Java Big Data Full Stack Development
HOW TO LEARN?
86Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
DIFFERENT WAYS
87Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
DIFFERENT WAYS
88Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
DIFFERENT WAYS
89Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
DIFFERENT WAYS
90Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
5. Visit conferences
DIFFERENT WAYS
91Java Big Data Full Stack Development
Recommended Books
92Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs

Java BigData Full Stack Development (version 2.0)

  • 1.
    Java BigData FullStack Development as is ... Alexey Zinovyev, Java Trainer in EPAM
  • 2.
    About With IT since2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015
  • 3.
    3Java Big DataFull Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs
  • 4.
    4Java Big DataFull Stack Development The Good Old Days
  • 5.
    5Java Big DataFull Stack Development HRs & RMs are looking for Java developers
  • 6.
    6Java Big DataFull Stack Development Is Java Dream Team waiting You?
  • 7.
    7Java Big DataFull Stack Development Required Skills • Advanced SQL • Basic Linux • Core Java & JVM • Backend Development Experience • Basic Computer Science Level
  • 8.
    8Java Big DataFull Stack Development REAL WORLD
  • 9.
    9Java Big DataFull Stack Development Let’s just use Javascript in frontend ONLY
  • 10.
    10Java Big DataFull Stack Development In frontend ONLY?
  • 11.
    11Java Big DataFull Stack Development Cruel world
  • 12.
    12Java Big DataFull Stack Development Do you know ML JS library?
  • 13.
    13Java Big DataFull Stack Development Wild animals everywhere
  • 14.
    14Java Big DataFull Stack Development And what I tell you
  • 15.
    15Java Big DataFull Stack Development And what I tell you
  • 16.
    16Java Big DataFull Stack Development It’s Time for Java Superhero, yeah!
  • 17.
    17Java Big DataFull Stack Development Before patterns discovering you should .. • Select small pieces • Define default values for missed data • Remove strange signals from data • Merge some tables in one if required
  • 18.
    18Java Big DataFull Stack Development How it really works • Share your date with us • Our magic manipulations • Building an answering machine • PROFIT!!!
  • 19.
    19Java Big DataFull Stack Development How to start?
  • 20.
    20Java Big DataFull Stack Development
  • 21.
    21Java Big DataFull Stack Development WHAT IS BIG DATA?
  • 22.
    22Java Big DataFull Stack Development Joke about Excel
  • 23.
    23Java Big DataFull Stack Development 5V
  • 24.
    24Java Big DataFull Stack Development Every 60 seconds…
  • 25.
    25Java Big DataFull Stack Development From Mobile Devices
  • 26.
    26Java Big DataFull Stack Development From Industry
  • 27.
    27Java Big DataFull Stack Development We started to keep and handle stupid new things!
  • 28.
    28Java Big DataFull Stack Development 10^6 rows in MySQL
  • 29.
    29Java Big DataFull Stack Development GB->TB->PB->?
  • 30.
    30Java Big DataFull Stack Development Is BigData about PBs?
  • 31.
    31Java Big DataFull Stack Development Is BigData about PBs?
  • 32.
    32Java Big DataFull Stack Development It’s hard to … • .. store • .. handle • .. search in • .. visualize • .. send in network
  • 33.
    33Java Big DataFull Stack Development Likes in Classmates: how to count?
  • 34.
    34Java Big DataFull Stack Development Crazy Zoo 2012
  • 35.
    35Java Big DataFull Stack Development Crazy Zoo 2016
  • 36.
    36Java Big DataFull Stack Development What will be lighted this training
  • 37.
    37Java Big DataFull Stack Development NOSQL
  • 38.
    38Java Big DataFull Stack Development What’s the problem with RBDMS’s • Caching • Master/Slave • Cluster • Table Partitioning • Sharding
  • 39.
    39Java Big DataFull Stack Development Family
  • 40.
    40Java Big DataFull Stack Development Database party
  • 41.
    41Java Big DataFull Stack Development Spring Data
  • 42.
    42Java Big DataFull Stack Development How to start?
  • 43.
    43Java Big DataFull Stack Development Java MongoDB Driver + Robomongo
  • 44.
    44Java Big DataFull Stack Development BIG DATA TOOL MASTER VS DATA SCIENTIST
  • 45.
    45Java Big DataFull Stack Development TRAIN MODEL
  • 46.
    46Java Big DataFull Stack Development Datasets • Facebook users, tweets • Trade transactions • Government • Medicine (genomic data) • Telecommunications
  • 47.
    47Java Big DataFull Stack Development Data Sources • Relational Databases • Data warehouses (Historical data) • Files in CSV or in binary format • Internet or electronic mails • Scientific, research (R, Octave, Matlab)
  • 48.
    48Java Big DataFull Stack Development Hey, man, predict something!
  • 49.
    49Java Big DataFull Stack Development Man or sofa?
  • 50.
    50Java Big DataFull Stack Development Typical questions for DM • Which loan applicants are high-risk?
  • 51.
    51Java Big DataFull Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud?
  • 52.
    52Java Big DataFull Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year?
  • 53.
    53Java Big DataFull Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year? • Can you recommend music for users?
  • 54.
    54Java Big DataFull Stack Development Green circle is blue square or red triangle? Let’s ask its neighbors! kNN (k-nearest neighbor)
  • 55.
    55Java Big DataFull Stack Development Collaborative Filtering
  • 56.
    56Java Big DataFull Stack Development Machine Learning vs Traditional Programming
  • 57.
    57Java Big DataFull Stack Development Data Science
  • 58.
    58Java Big DataFull Stack Development Can a Java programmer to be a Data Scientist?
  • 59.
    59Java Big DataFull Stack Development Sexy Data Scientist
  • 60.
    60Java Big DataFull Stack Development Real Data Scientist
  • 61.
    61Java Big DataFull Stack Development How to start?
  • 62.
    62Java Big DataFull Stack Development Weka
  • 63.
    63Java Big DataFull Stack Development HADOOP
  • 64.
    64Java Big DataFull Stack Development Hadoop and Data Knights
  • 65.
    65Java Big DataFull Stack Development Hadoop
  • 66.
    66Java Big DataFull Stack Development MapReduce in different languages
  • 67.
    67Java Big DataFull Stack Development MapReduce for WordCount
  • 68.
    68Java Big DataFull Stack Development Hadoop Jobs
  • 69.
    69Java Big DataFull Stack Development Hadoop frameworks • Universal (MapReduce, Tez, RDD in Spark) • Abstract (Pig, Pipeline Spark) • SQL - like (Hive, Impala, Spark SQL) • Processing graph (Giraph, GraphX) • Machine Learning (Mahout, MLib) • Stream processing (Spark Streaming, Storm)
  • 70.
    70Java Big DataFull Stack Development SPARK
  • 71.
    71Java Big DataFull Stack Development SPARK: the bloody son of MR • MapReduce in memory • Up to 50x faster than Hadoop • RDD is a basic building block (immutable distributed collections of objects) • Pipeline API (no needs in PIG)
  • 72.
    72Java Big DataFull Stack Development Spark Family
  • 73.
    73Java Big DataFull Stack Development MLlib supports • Classification and regression • Collaborative filtering • Clustering • Dimensionality reduction • Optimization
  • 74.
    74Java Big DataFull Stack Development Code sample MLlib (K-Means) // Cluster the data into two classes using KMeans int numClusters = 2; int numIterations = 20; KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); // Evaluate clustering by computing Within Set Sum of Squared Errors double WSSSE = clusters.computeCost(parsedData.rdd()); System.out.println("Within Set Sum of Squared Errors = " + WSSSE); // Save and load model clusters.save(sc.sc(), "myModelPath"); KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
  • 75.
    75Java Big DataFull Stack Development MLlib • .. extends scikit-learn (Python lib) and Mahout • .. runs fully on Spark and supports Spark’s Pipeline API • .. dataset is represented by Spark SQL’s SchemaRDD • .. supports Hive like external data source • .. is well for large datasets and parallelized algorithms
  • 76.
    76Java Big DataFull Stack Development It solves all problems!
  • 77.
    77Java Big DataFull Stack Development How to start?
  • 78.
    78Java Big DataFull Stack Development HDP Zoo
  • 79.
    79Java Big DataFull Stack Development Ok, Google!
  • 80.
    80Java Big DataFull Stack Development AWS Amazon
  • 81.
    81Java Big DataFull Stack Development Infrastructure issues are waiting YOU!
  • 82.
    82Java Big DataFull Stack Development DEEP LEARNING
  • 83.
    83Java Big DataFull Stack Development Deep Learning help us build NEW FUTURE
  • 84.
    84Java Big DataFull Stack Development Deep Learning help us build NEW FUTURE
  • 85.
    85Java Big DataFull Stack Development HOW TO LEARN?
  • 86.
    86Java Big DataFull Stack Development 1. Read books and write ‘pet’ projects DIFFERENT WAYS
  • 87.
    87Java Big DataFull Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process DIFFERENT WAYS
  • 88.
    88Java Big DataFull Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC DIFFERENT WAYS
  • 89.
    89Java Big DataFull Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course DIFFERENT WAYS
  • 90.
    90Java Big DataFull Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course 5. Visit conferences DIFFERENT WAYS
  • 91.
    91Java Big DataFull Stack Development Recommended Books
  • 92.
    92Java Big DataFull Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs