The presentation introduces Hadoop, Hive, and loggers. It discusses how data flows into Hadoop and the types of Hadoop including path optimization, basket analysis, next product to buy analysis, and granular customer segmentation. Examples of Hadoop include Intwritable, Long writable, Boolean writable, Float writable, and Byte writable. Hive is introduced as a data warehouse system for Hadoop that uses MapReduce for execution and HDFS for storage. The advantages and disadvantages of Hadoop are outlined along with applications such as marketing analytics, machine learning, image processing and web crawling.
2. Topic
Introduce to Hadoop
Introduce to Hive
Introduce to Logger
Warehouse Mobion
Advantages
Disadvantages
Applications
3/11/2016 Pham Thai Hoa
3. What is Hadoop
Hadoop is a free, Java-based
programming framework that supports
the processing of large data sets in a
distributed computing environment. It
is part of the Apache project
sponsored by the Apache Software
Foundation.
3/11/2016 Pham Thai Hoa
5. Types of Hadoop
Path Optimization – Path optimization
aims at reducing bounce rates and
improving conversions.
Basket Analysis – This aims at
understanding aggregate customer
purchasing behavior by examining such
things as customer interests, and paths
to purchase – when customers bought
Product X, what common paths did they
take to get there.
3/11/2016 Pham Thai Hoa
6. Types of Hadoop
Next Product to Buy Analysis – Related to
basket analysis, this type of analysis looks at
correlation in purchases, and what can be
offered next to help provide more immediate
value to the customer, and increase the
likelihood of another sale.
Allocation of Website Resources – Having
clickstream data on hand, a company will
know what their hottest and coldest paths on
the site are and can assign development
resources accordingly, optimizing resource
allocation
3/11/2016 Pham Thai Hoa
7. Types of Hadoop
• Granular Customer Segmentation –
With clickstream and correlated user
data, a company can discover and
gain insight on how particular
segments and micro-segments of
customers are using the site, and how
to best cater to them.
3/11/2016 Pham Thai Hoa
8. Example of hadoop
Intwritable
Long writable
Boolean writable
Float writable
Byte writable
3/11/2016 Pham Thai Hoa
9. What is Hive
Hive is a data warehouse system for
Hadoop
Using Map-Reduce for execution
Using HDFS for storage
Metadata in an RDBMS
Scalability and performance
Interoperability
Using a SQL-like language called
HiveQL
3/11/2016 Pham Thai Hoa
10. Warehouse at Mobion
Log Collector
Log/Data Transformer
Data Analyzer
Web Reporter
Log define
Log integrate (into application)
Log/Data analyze
Report develop (API, Mobion, Music
…)
3/11/2016 Pham Thai Hoa
11. Warehouse at Mobion
Data mining
Music Recommendation
Spam Detection
Application performance
Export data and import into MySQL for
web report
Analytic system
3/11/2016 Pham Thai Hoa
12. Advantages
Light weight persistence object
High performance
Scalability
Error recovery:-it automatically
replicate the data its server or disk got
crashed.
3/11/2016 Pham Thai Hoa
13. Performance
Better reduce
Impure data intigrity
Impure security
Application perforce is good
3/11/2016 Pham Thai Hoa
14. Disadvantages
Security is concerns
Vulnerable by nature
Not fit for small data
Potential stability issues
General limitation
3/11/2016 Pham Thai Hoa
15. Applications
Marketing analytics
Machine learning or sophisticated data
mining
Image processing
Processing of XML messages
Web crawling or text processing
3/11/2016 Pham Thai Hoa