+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Liferay & Big Data Dev Con 2014
1. Liferay & Big Data
Getting value from your data
!
Miguel Ángel Pastor Olivar
miguel.pastor@liferay.com
2. Who am I?
!
• Some random guy
!
• Member of the Liferay core infrastructure
team
!
•Disclaimer: Not a computer scientist
!
• @miguelinlas3
3. What are we going to talk about?
!
• Big Data: what is this about?
!
• Simple architecture proposal
!
• Use cases
!
• Questions (and hopefully answers)
5. • Data is so big that regular solutions are:
!
–Extremely slow
!
–Too small
!
–Really expensive
!
• How we use all the data we already own
6. !
• Volume
–Transactions, data streaming from social media, …
!
• Velocity
–Torrents of data in real time
!
• Variety
–Numerical data, text, email, video, audio, …
8. • Recommender systems
!
• Predicting the future:
– Netflix does autoscaling based on past
network data traffic
!
• Churn models
– Big telco companies build social networks
to reduce the churn
9. • Sentiment analysis
–Are talking about you in the Internet?
!
• Real Time Bidding
–Optimise advertising
!
• Health care
–Improve patients health while reducing costs
–Improve quality of life of multiple sclerosis patients
11. • Storage models
• How to store relevant information
!
• Computation models
• Process and transform all the information
!
• Analytics
• How we can take actions based on the
previous steps
14. Hadoop Distributed File System (HDFS)
!
• Java based file system
!
• Scalable, fault-tolerant, distributed storage
!
• Designed to run on commodity hardware
!
• Closely related to MapReduce
19. • Modern relational databases
!
• Same scalable performance than NoSQL for
OLTP
!
• Maintain ACID guarantees
!
• A few alternatives: VoltDB, Google Spanner,
FoundationDB, …
22. Apache Hadoop Map Reduce
!
• Distributed processing
!
• Large datasets
!
•Clusters of computers
#LRNAS2014
!
• Simple programming model
!
• Verbose and hard to use API
23. Liferay
projects
is
the
best
Open
Source
project
best: 1
is: 1
Liferay: 1
Open: 1
project: 2
Source: 1
the: 1
(index, “…”)
(index, “…”)
(index, “…”)
(index, “…”)
(index, “…”)
Sort
and
shuffle
(best, [1])
(is, [1])
(Liferay: 1)
(Open, [1])
(project, [1,1])
(Source, [1])
(the, [1])
24. • Batch model data crunching
!
• Not so good event stream processing
!
• But …
!
• Many algorithms hard to implement using
MapReduce
!
• Cascading, Scalding, Cascalog, Impala, …
26. • Distributed realtime computation system
!
• Easy to reliably process unbounded streams of data
!
• Multi language support
!
• Realtime analytics, online machine learning, continuous
computation, distributed RPC, ETL, …
33. • Focused on:
• Data visualisation
• Statistical computations
• Analysis of data
!
• Tons of built-in packages
!
• Connect to Hadoop through Hadoop Streaming
!
• Not a fast language
45. RDBMS
Event Broker
Hadoop
User
Tracking
NoSQL
Storage
System
Events
Search
Data
Logs
Monitoring Dataware
House
Streaming Social
Graph
46. Batch processing?
!
Real time processing?
!
Machine learning algorithms?
!
Graph analysis?
!
Unified programming model?
47.
48. !
• Fast and general engine for large-scale data
processing
!
• Write your apps in Java, Scala or Python
!
• Run on YARN cluster manager
!
• Can read any existing Hadoop data (HDFS)
!
• In memory or disk
51. • Driver main function and executes various
parallel operations on a cluster
!
• Resilient Distributed Datasets (RDD)
• HDFS (or any Hadoop file system)
!
• Scala collection
!
• Second abstraction: shared variables
53. • Mix SQL queries with Spark programs
!
• Unified Data Access
!
• Hive compatibility
!
• Standard JDBC or ODBC connectivity
!
• Same engine for both interactive and long running
queries
55. • Build your apps using high-level operators
!
• Fault tolerance: exactly-once semantics out of the box
!
• Combine streaming with batch and interactive queries
!
• Can read from HDFS, Flume, Kafka, Twitter and ZeroMQ
!
• Define your own custom data sources
60. !
• Graphs API and graph-parallel computation
!
• Growing scale and importance
• From social networks to language modelling
!
• Directed multigraph with properties attached to each
vertex and edge
!
• Growing collection of graph algorithms and builders
63. • Not about data size, but how you use it
!
• You already own tons of data, you just need to take get
value from it
!
• There is no silver bullet: you’ve plenty of alternatives
!
• JVM Big data related techs are usually a great choice
!
• Try it yourself!!
65. !•
Apache Kafka
!
• Apache Spark
!
• Apache Storm
!
• Apache Hadoop
!
• Big Data definition at Wikipedia
!
• Liferay Kafka Bridge
!
• What every software engineer should know about a log