Re-Ranking Web Data per Knowledge Domain

  • Grace Zhao Graduate Center, CUNY
  • Xiaowen Zhang City University of New York
Keywords: document re-ranking, information retrieval, knowledge management, ontology learning

Abstract

We propose a re-ranking algorithm to effectively assay and re-sequence the web data crawled by some credited web search engines, to meet the user needs in a domain space. The algorithm studies the structure and semantics of the domain ontology graph and constructs computational relations among nodes. After examining matching terms between ontology dictionary and the textual content (text, metadata) of the retrieved documents, we calculate three-dimensional information scores – distance, direction, and relationship of each document in the top k search result set. We further explore the directional relation with three information degrees: granularity, diversity and generality, and subsequently re-rank the retrieved documents.

Author Biography

Grace Zhao, Graduate Center, CUNY

Grace Zhao

Ph.D. Candidate

Computer Science

Graduate Center, City University of New York

 

Xiaowen Zhang

Associate Professor

Computer Science

College of Staten Island, City University of New York

 

References

L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, “Placing search in context: The concept revisited,” in Proceedings of the 10th international conference on World Wide Web. ACM, 2001, pp. 406–414.

V. W. Tam and J. Shepherd, “Webpage relationships for information retrieval within a structured domain,” in Proceedings of the 21st ACM conference on Hypertext and hypermedia. ACM, 2010, pp. 307–308.

E. Krikon, O. Kurland, and M. Bendersky, “Utilizing inter-passage and interdocument similarities for reranking search results,” ACM Transactions on Information Systems (TOIS), vol. 29, no. 1, p. 3, 2010.

Y. Liu, B. Zhang, Z. Chen, M. R. Lyu, and W.-Y. Ma, “Affinity rank: a new scheme for efficient web search,” in Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters. ACM, 2004, pp. 338–339.

X. Huang and Q. Hu, “A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009, pp. 307–314.

C. Kang, X. Wang, J. Chen, C. Liao, Y. Chang, B. Tseng, and Z. Zheng, “Learning to re-rank web search results with multiple pairwise features,” in Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011, pp. 735–744.

W. Li, “Domain-specific information retrieval using rcommenders,” in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011, pp. 1327–1328.

B. Xiang, D. Jiang, J. Pei, X. Sun, E. Chen, and H. Li, “Context-aware ranking in web search,” in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010, pp. 451–458.

A. Sieg, B. Mobasher, and R. Burke, “Web search personalization with ontological user profiles,” in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007, pp. 525–534.

F. Qiu and J. Cho, “Automatic identification of user interest for personalized search,” in Proceedings of the 15th international conference on World Wide Web. ACM, 2006, pp. 727–736.

M. Daoud, L. Tamine, and M. Boughanem, “A personalized graph-based document ranking model using a semantic user profile,” in International Conference on User Modeling, Adaptation, and Personalization. Springer, 2010, pp. 171–182.

L. Tamine-Lechani, M. Boughanem, and N. Zemirli, “Personalized document ranking: Exploiting evidence from multiple user interests for profiling and retrieval.” JDIM, vol. 6, no. 5, pp. 354–365, 2008.

M. Baziz, M. Boughanem, G. Pasi, and H. Prade, “An information retrieval driven by ontology from query to document expansion,” in Large Scale Semantic Access to Content (Text, Image, Video, and Sound). LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, 2007, pp. 301–313.

A. Hawalah and M. Fasli, “A hybrid re-ranking algorithm based on ontological user profiles,” in 2011 3rd Computer Science and Electronic Engineering Conference (CEEC). IEEE, 2011, pp. 50–55.

X. Yan, R. Y. Lau, D. Song, X. Li, and J. Ma, “Toward a semantic granularity model for domain-specific information retrieval,” ACM Transactions on Information Systems (TOIS), vol. 29, no. 3, p. 15, 2011.

C. X. Zhai, W. W. Cohen, and J. Lafferty, “Beyond independent relevance: methods and evaluation metrics for subtopic retrieval,” in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003, pp. 10–17.

R. B. Allen and Y. Wu, “Generality of texts,” in International Conference on Asian Digital Libraries. Springer, 2002, pp. 111–116.

X. Yan, X. Li, and D. Song, “Document generality: its computation for ranking,” in Proceedings of the 17th Australasian Database Conference-Volume 49. Australian Computer Society, Inc., 2006, pp. 109–118.

C. Leacock and M. Chodorow, “Combining local context and wordnet similarity for word sense identification,” WordNet: An electronic lexical database, vol. 49, no. 2, pp. 265–283, 1998.

L. Page, S. Brin, R. Motwani et al., “The pagerank citation ranking: Bring orderto the web,” Stanford University, Teeh Rep, 1997: 0072, Tech. Rep.

L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web.” Stanford InfoLab, Tech. Rep., 1999.

Published
2019-05-31