×
  • Share
  • Email
  • Embed
  • Like
  • Private Content
 

HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory Research

by on May 30, 2012

  • 1,869 views

Mignify is a platform for collecting, storing and analyzing Big Data harvested from the web. It aims at providing an easy access to focused and structured information extracted from Web data flows. It ...

Mignify is a platform for collecting, storing and analyzing Big Data harvested from the web. It aims at providing an easy access to focused and structured information extracted from Web data flows. It consists of a distributed crawler, a resource-oriented storage based on HDFS and HBase, and an extraction framework that produces filtered, enriched, and aggregated data from large document collections, including the temporal aspect. The whole system is deployed in an innovative hardware architecture comprising of a high number of small (low-consumption) nodes. This talk will tackle the decisions made along the design and development of the platform, both under a technical and functional perspective. It will introduce the cloud infrastructure, the LTE-like ingestion of the crawler output into HBase/HDFS, and the triggering mechanism of analytics based on a declarative filter/extraction specification. The design choices will be illustrated with a pilot application targeting Daily Web Monitoring in the context of a national domain.

Statistics

Views

Total Views
1,869
Views on SlideShare
1,715
Embed Views
154

Actions

Likes
2
Downloads
70
Comments
0

3 Embeds 154

http://www.cloudera.com 144
http://blog.cloudera.com 9
http://webcache.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory Research HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory Research Presentation Transcript