Design and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks하이퍼링크의 링크 구조와 텍스트 정보를 이용한 커뮤니티 기반의 클러스터 크롤러의 설계 및 구현

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 378
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorWhang, Kyu-Young-
dc.contributor.advisor황규영-
dc.contributor.authorKhamidov, Ravshan-
dc.date.accessioned2011-12-13T06:06:53Z-
dc.date.available2011-12-13T06:06:53Z-
dc.date.issued2007-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=268875&flag=dissertation-
dc.identifier.urihttp://hdl.handle.net/10203/34783-
dc.description학위논문(석사) - 한국과학기술원 : 전산학전공, 2007. 8, [ vii, 39 p. ]-
dc.description.abstractCommunity-limited search is a technique for improving the quality of search output by limiting the search within a specified community. A community in this thesis refers to a collection of semantically-related web pages. There have been few techniques proposed for finding such communities. The incremental cluster crawler, proposed by Kim, finds communities incrementally using the link structure of web pages crawled. This crawler, however, has some drawbacks. For instance, it does not consider the text information. Moreover, seed URLs affect clustering quality because one community is created for each seed URL. In this thesis, we propose a new method for finding communities incrementally. The key idea is to use both the link structure and the text information. Specifically, it first computes the similarity based on the link structure and the text information separately, and then combines the two resulting similarity scores. To compute the similarity based on the text information, we use the text embedded in the hyperlink to a target web page instead of the text in the target web page itself. By using both the link structure and text information, the proposed method can improve the overall clustering quality. We also propose a method for merging communities to reduce the influence of seed URLs on the clustering quality. The proposed method merges communities that are created from different seed URLs by computing the similarity between communities. Experimental results show that the proposed method improves the clustering quality by up to 3 times compared with the incremental cluster crawler proposed by Kim.eng
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectweb crawling-
dc.subjectweb clustering-
dc.subjectweb community-
dc.subject웹 크롤링-
dc.subject웹 클러스터링-
dc.subject웹 커뮤니티-
dc.subjectweb crawling-
dc.subjectweb clustering-
dc.subjectweb community-
dc.subject웹 크롤링-
dc.subject웹 클러스터링-
dc.subject웹 커뮤니티-
dc.titleDesign and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks-
dc.title.alternative하이퍼링크의 링크 구조와 텍스트 정보를 이용한 커뮤니티 기반의 클러스터 크롤러의 설계 및 구현-
dc.typeThesis(Master)-
dc.identifier.CNRN268875/325007 -
dc.description.department한국과학기술원 : 전산학전공, -
dc.identifier.uid020044370-
dc.contributor.localauthorWhang, Kyu-Young-
dc.contributor.localauthor황규영-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0