Overview
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to both complex page layout designs and unrestricted user created posts. This paper, studies the problem of structured data extraction from various web forum sites. The target is to find a solution as general as possible to extract structured data, such as post title, post author, post time, and post content from any forum site. In contrast to most existing information extraction methods, which only lever-age the knowledge inside an individual page, the paper incorporates both page-level and site-level knowledge and employ Markov Logic Networks (MLNs) to effectively integrate all useful evidence by learning their importance automatically.
|
|
Overwhelmed by consolidation? Take it in steps.
Learn the 5 steps to data center consolidation - download the whitepaper now.
Choose a career with Accenture in Singapore
A dynamic job opportunity where technology and business intersect
Choose a career with Accenture in Malaysia
A dynamic job opportunity where technology and business intersect
Improving the Security & Management of Active Directory:
See a live demonstration of NetIQ DRA now
The Roots for a Greener World
Discover Hitachi's Environmental Vision 2025 and featured Eco-Products
The Desktop Virtualization Revolution is here!
Find our more with Citrix Simplicity is Power
Master in Organisational Leadership
Part-time masters program from Monash University. Find out more.
Lack of visibility into network issues and performance?
Find out today. Download SolarWinds FREE 30-Day Trial Software here.
IT Salary & Skills Report 2009
Join activeTechPros for free access to the report