Slide 2 collecting, storing and analyzing big data
Collecting, Storing and
Analyzing Big Data
Big Data Process Development
Collecting → Storing → Processing → Analyzing → Learning → Reacting
Data engineering process: 3 tasks
a. Big Data Storage Concepts
b. Big Data Storage Technology
a. Big Data Processing Concepts
b. Big Data Processing Technology
Data Science/Machine Learning process: 3 tasks
4) Analyzing → 5) Learning → 5) Reacting
Big Data Analytics Lifecycle
When standard relational database
(Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a startup, tracking all actions in
mobile games: iOS, Android, ...
3 common problems in Big Data System
1. Size: the volume of the datasets is a critical factor.
2. Complexity: the structure, behaviour and permutations of the datasets is
a critical factor.
3. Technologies: the tools and techniques which are used to process a
sizable or complex dataset is a critical factor.
What is Apache Phoenix ?
Apache Phoenix is a SQL skin over HBase.
It means scaling Phoenix just like scale-up and
scale-out the Hbase
Interesting features of Apache Phoenix
● Embedded JDBC driver implements the majority of java.sql interfaces, including
the metadata APIs.
● Allows columns to be modeled as a multi-part row key or key/value cells.
● Full query support with predicate push down and optimal scan key formation.
● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for
● Versioned schema repository. Snapshot queries use the schema that was in
place when data was written.
● DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for
mass data transfer between the same or different tables, and DELETE for
● Limited transaction support through client-side batching.
● Single table only - no joins yet and secondary indexes are a work in progress.
● Follows ANSI SQL standards whenever possible
● Requires HBase v 0.94.2 or above
● 100% Java