Use this resource - and many more! - in your textbook!
AcademicPub holds over eight million pieces of educational content for you to mix-and-match your way.
Deep Web adaptive crawling based on minimum executable pattern
By: Zhaohui Wu; Qinghua Zheng; Jun Liu; Lu Jiang;
2011 / Springer Science+Business Media / 0925-9902
The key to Deep Web Crawling is to submit valid input values to a query form and retrieve Deep Web content efficiently. In the literature, related work focus only on generic text boxes or entire query forms, causing the problem of “data islands” or inferior validity of query submission. This paper proposes the concept of Minimum Executable Pattern (MEP), a minimal combination of elements in a query form that can conduct a successful query, and then presents a MEPGeneration method and a MEP-based Deep Web adaptive crawling method. The query form is parsed and partitioned into MEP set, and then local-optimal queries are generated by choosing a MEP in the MEP set and a keyword vector of the MEP. Furthermore, the crawler can make a decision on its termination to balance the trade-off between high coverage of the content and resource consumption. The adoption of MEP is expected to improve the validity of query submission, and adaptive selection of multiple MEPs shows good effect for overcoming the problem of “data islands”. We present a set of experiments to validate the effectiveness of the proposed method. Experimental results show that our method outperforms the state of art methods in terms of query capability and applicability, and on average, it achieves good coverage by issuing only a few hundred queries.