What's Big Data? - Big Data Tech - 2015 - Firenze

Firenze - 15 Ottobre 2015
presenta Alberto Paro, CTO Big Data Technologies
BIG DATA – Che cosa è?

Alberto Paro
 Laureato in Ingegneria Informatica (POLIMI)
 Lavoro presso Big Data Technologies + Consulting
 Autore di due libri su ElasticSearch + 5 Tech review
 Lavoro principalmente in Scala e su tecnologie BD
(Akka, Spray.io, Playframework, Apache Spark) e
NoSQL (Cassandra, ElasticSearch e MongoDB)
 Evangelist linguaggio Scala e Scala.JS

 Definizione
 4 V
 NoSQL
 Hadoop/Spark
 Linguaggio Scala

Definizione
L’idea base della frase 'Big Data' è che ogni
cosa che noi facciamo lascia una traccia
digitale (‘data’), che noi (e altri) possiamo
usare e analizzare.
Big Data si riferisce alla nostra abilità di
fare uso di questi volumi di dati che
sempre aumentano.

From the dawn of civilization until 2003,
humankind generated five exabytes of
data. Now we produce five exabytes
every two days…and the pace is
accelerating.
Eric Schmidt,
Executive Chairman, Google

Definizione
Da dove arrivano questi dati?
 Attività “Online”
 Comunicazione
 Foto e Video
 Dati di sensori
 The Internet of Things

4 V
L’aumento dei dati porta al big data,
descritto anche con le 4 V:
 Volume
 Velocity
 Variety
 Veracity

La ‘Datafication’
 Actività
 Conversazioni
 Testo
 Voce
 Social Media
 Browser log
 Foto
 Video
 Sensori
 Etc.
Volume
Veracity
Variety
Velocity
Big Data
Analysing :
 Text
analytics
 Sentiment
analysis
 Face
recognition
 Voice
analytics
 Movement
analytics
 Etc.
Valore
Trasformare Big Data in Valore:

NoSQL
 Ogni database che non è un Database
relazionale”
 Il termine fu coniato durante un meet-up
 “Non-relational Databases”
 Not Only SQL

NoSQL - Tipologie
Key-Value
 Redis
 Voldemort
 Dynomite
 Tokio*
BigTable Clones
 Hbase
 Hyperbase
 Cassandra
Document
 CouchDB
 MongoDB
 Redis
GraphDB
 Neo4j
 OrientDB
 …Graph

NoSQL
 Non esiste il “DB” che copre tutte le
casistiche. Ognuno ha caratteristiche
proprie.
 Il mercato sta effettuando “selezione
naturale”
 Spesso occorre utilizzare più di un NoSQL.
 Hadoop e/o Spark sono il collante.

Hadoop / Spark
Input
Iter 1
HDFS
Iter 2
HDFS
HDFS
Read
HDFS
Read
HDFS
Write
HDFS
Write
Input
Iter 1 Iter 2
Hadoop MapReduce
Apache Spark
Evoluzione del modello Map Reduce

Apache Spark
 Scritto in Scala con API in Java, Python e R
 Evoluzione del modello Map/Reduce
 Potenti moduli a corredo:
 Spark SQL
 Spark Streaming
 MLLib (Machine Learning)
 GraphX (graph)

Linguaggio - Scala
 Nato da uno degli autori del compilatore di
Java
 Interoperabile con Java
 Approccio Reattivo
 Funzionale + Object Oriented + Actor Model
 Concorrenza by design
 Strong typed
 Linguaggio base di Spark / Akka /
Playframework

Linguaggio – Scala vs Javapublic class User {
private String firstName;
private String lastName;
private String email;
private Password password;
public User(String firstName, String lastName,
String email, Password password) {
this.firstName = firstName;
this.lastName = lastName;
this.email = email;
this.password = password;
}
public String getFirstName() {return firstName; }
public void setFirstName(String firstName) { this.firstName = firstName; }
public String getLastName() { return lastName; }
public void setLastName(String lastName) { this.lastName = lastName; }
public String getEmail() { return email; }
public void setEmail(String email) { this.email = email; }
public Password getPassword() { return password; }
public void setPassword(Password password) { this.password = password; }
@Override public String toString() {
return "User [email=" + email + ", firstName=" + firstName + ", lastName=" + lastName + "]"; }
@Override public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : firstName.hashCode());
result = prime * result + ((password == null) ? 0 : password.hashCode());
return result; }
@Override public boolean equals(Object obj) {
if (this == obj) return true; if (obj == null) return false; if
(getClass() != obj.getClass()) return false; User other = (User) obj; if (email ==
null) { if (other.email != null) return false; } else if
(!email.equals(other.email)) return false; if (password == null) { if
(other.password != null) return false; } else if
(!password.equals(other.password)) return false; if (firstName == null) { if
(other.firstName != null) return false; } else if
(!firstName.equals(other.firstName)) return false; if (lastName == null) { if
(other.lastName != null) return false; } else if
(!lastName.equals(other.lastName)) return false; return true; }
}
case class User(
var firstName:String,
var lastName:String,
var email:String,
var password:Password)
JAVASCALA

Scala.JS
 Interoperabilità con Javascript
 Permette di compilare codice Scala in
Javascript => Typesafe
 Riuso di algoritmi/Modelli BK->FE
 Sinergia con React.js
 Permette lo sviluppo di SPA di grandi
dimensioni

Link Utili
 Cassandra: http://cassandra.apache.org/
 MongoDB: https://www.mongodb.org/
 Neo4J: http://neo4j.com/
 Spark: http://spark.apache.org/
 Scala:
 http://scala-lang.org/
 http://www.scala-js.org/

Grazie per
l’attenzione
Alberto Paro

What's Big Data? - Big Data Tech - 2015 - Firenze

Recommended

Recommended

More Related Content

Similar to What's Big Data? - Big Data Tech - 2015 - Firenze

Similar to What's Big Data? - Big Data Tech - 2015 - Firenze (20)

More from Alberto Paro

More from Alberto Paro (8)

What's Big Data? - Big Data Tech - 2015 - Firenze

Editor's Notes