Swoogle is a crawler based indexing and retrieval system for the Semantic Web documents like RDF or OWL. It was a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle provided services to human users through a browser interface and to software agents via RESTful web services. Several techniques were used to rank query results inspired by the PageRank algorithm developed at Google but adapted to the semantics and use patterns found in semantic web documents. Swoogle was developed and hosted by the University of Maryland, Baltimore County with funding from the US DARPA and National Science Foundation agencies. It was PhD thesis work of Li Ding advised by Professor Tim Finin.

Swoogle interface
Swoogle interface
Swoogle interface

Swoogle’s architecture can be broken into four major components such as SWD discovery, metadata creation, data analysis, and interface. This architecture is data centric and extensible: different components work on different tasks independently. The SWD discovery component is responsible to discover the potential SWDs throughout the Web and keep up-to-date information about SWDs. The metadata creation component caches a snapshot of a SWD and generates objective metadata about SWDs in both syntax level and semantic level. The data analysis component uses the cached SWDs and the created metadata to derive analytical reports, such as classification of SWO and SWDB, rank of SWDs, and the IR index of SWDs. The interface component focuses on providing data service to the Semantic Web community.

The architecture of Swoogle
The architecture of Swoogle
The architecture of Swoogle

Current web search engines such as Google and other web search engines do not work well with documents encoded in the semantic web languages RDF and OWL. These retrieval systems are designed to work with natural languages and expect documents to contain unstructured text composed of words. They do a poor job of tokenizing semantic web documents and do not understand conventions such as those involving XML namespace. Moreover, they do not understand the structural information encoded in the documents and are thus unable to take advantage of it. Semantic web researchers need search and retrieval systems today to help them find and analyze semantic web documents on the web. Swoogle is the research project which gave the solution for this scenario. These systems can be used to support the tools being developed by researchers such as annotation editors as well as software agents whose knowledge comes from the semantic web.

Swoogle is useful to avoid creating new ontologies by reuse. Swoogle search find the suitable already exist ontologies within underlying domain, matching with the user’s need. Swoogle reasons about the semantic web documents on the Web and their constituent parts and records meaningful metadata about them. Swoogle provides web scale semantic web data access service, which helps human users and software systems to find relevant documents, terms and triples, via its search and navigation services. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document.

Swoogle advanced query and Swoogle query result

Being a research project, and with a non commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question is provided that the search engines return very relevant results for a query how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology.

Software Engineer