This document summarizes a framework for extracting blog posts and comments from web pages. It discusses two main stages: 1) Locating the main text using DOM tree structure and identifying noises like advertisements. The main text usually contains the most words and largest visual space. 2) Finding the separator between posts and comments using an information quantity algorithm based on HTML tag frequencies. An experiment on 25910 blog pages from the top 100 domains achieved over 90% precision on locating the main text and separator.