Publications

Publications

Bhuvan Bamba, Ling Liu, James Caverlee, Vaibhav Padliya, Mudhakar Srivatsa, Tushar Bansal, Mahesh Palekar, Joseph Patrao, Suiyang Li and Aameek Singh, "DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web ", In the Proceedings of International Conference on Data Engineering 2007 (Demonstration Paper)
J. Caverlee, L. Liu, and W. B. Rouse. Link-Based Ranking of the Web with Source-Centric Collaboration (invited). 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing, (CollaborateCom), Atlanta, 2006.

Posters

PeerCrawl Poster for ICDE 2007

Related Publications

[1] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1998

[2] R. Miller and K. Bharat. SPHINX: A framework for creating personal, site-specific web crawlers. In Proceedings of the 7th World-Wide Web Conference (WWW7), 1998.

[3] Chakrabharti, S., Van Den Berg, M., AND Dom, B. 1999. Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the Eighth International Conference on The World-Wide Web.

[4] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Building domain-specic search engines with machine learning techniques,” in Proc. AAAI Spring Symposium on Intelligent Agents in Cyberspace, 1999.

[5] J. Rennie and A. McCallum, “Using reinforcement learning to spider the web efficiently,” in Proc. International Conference on Machine Learning (ICML), 1999.

[6] Vladislav Shkapenyuk and Torsten Suel. Design and implementation of a high-performance distributed web crawler. In IEEE International Conference on Data Engineering (ICDE), 2002.

[7] J. Cho and H. Garcia-Molina. Parallel crawlers. In Proceedings of the 11th International World Wide Web Conference, 2002.

[8] Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. UbiCrawler: a scalable fully distributed Web crawler. Software, Practice and experience, 34(8):711–726, 2004.

[9] James Caverlee and Ling Liu. Resisting Web Spam with Credibility Based Link Analysis

[10] A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219–229, 1999.

[11] A. Singh, M. Srivatsava, L. Liu, and T. Miller. Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web. Lecture Notes in Computer Science, 2924, 2004.

[12] Martijn Koster. The Robot Exclusion Standard. “http://www.robotstxt.org/”.

[13] Gnutella Network http://www.gnutella.com

[14] V. J. Padliya and L. Liu. Peercrawl: A decentralized peer-to-peer architecture for crawling the world wide web. Technical report, Georgia Institute of Technology, May 2006

[15] Jialun Qin , Yilu Zhou , Michael Chau, Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA

[16] Bergmark, D., Lagoze, C. and Sbityakov, A. (2002b).“Focused Crawls, Tunneling, and Digital Libraries”, in Proc.of the 6th European Conference on Digital Libraries, Rome, Italy