EVALUATING SEARCH EFFECTIVENESS OF SOME SELECTED SEARCH ENGINES

With advancement in technology, many individuals are getting familiar with the internet a lot of users seek for information on the World Wide Web (WWW) using variety of search engines. This research work evaluates the retrieval effectiveness of Google, Yahoo, Bing, AOL and Baidu. Precision, relative recall and response time were considered for this evaluation, a total of 24 search queries were sampled based on information queries, navigational queries and transactional queries. They are categorized into single word queries, double word queries, sentence queries and alphanumeric queries. Finally the overall average of all the designated queries shows Bing has the best score based on relative recall with 0.54, Google has the best response time with 0.39s and has the highest precision with 1.37.


INTRODUCTION
The primary objective of information retrieving system (IRS) is to find or access relevant documents with respect to user's query. The documents are being retrieved as a ranked list, whereby the ranking is estimated based on relevance. Today, while searching for information on World Wide Web (WWW), one usually carries out a query request through a term based search engine. The World Wide Web is a rapidly expanding hyperlinked collection of unstructured information, a friendly user interface and also contains a hypermedia features for which have been attracting number of information providers Kumar and Bhadu (2013).
The growth of the World Wide Web (WWW) is an unprecedented trend. Four years after the web birth in 1990, millions or more copies of the well known Web browser Mosaic was in use. The growth was as a result of increase of the web servers, value and the number of the web pages that are accessible by these servers. In 1999 the number of Web servers was estimated at about 3 million and the number of Web pages at about 800 million, and three years later, in June 2002, the search engine AlltheWeb announced that its index contained information about 2.1 billion Web pages Can, Nuray and Sevdik (2004).
A web search engine is an information retrieval system, which is used to locate Web pages relevant to user queries. The search engines do not really search Web directly, each one searches a database of the full text of Web pages selected from the billions of web pages. Search engine databases are built by a computer robot programs called Spider. They are reponsible for finding the pages for potential inclusion by following the links in the pages they already have in their database. Also, search engine spiders cannot find a Web page that is never linked to any other page. The only way such can get into the search engine companies is by requesting that the new page be included. After the spiders find the pages , they pass them to another computer program for indexing. It is used for indentifying text, links and other content in the page and store it in the search engine database's files so that the database can be searched by keyword Akinola (2010).
The effectiveness of search engines depend on the features it has. Some search engines retrieve document within few seconds. The Web crawler is responsible for identifying the data available in the vast sphere. Web crawling is carried out over web search engines recursively to offer up to date data to the users. The arrangement of Web page also plays a vital role, whereby it matches the user query with the Web page existing in the database. Website indexing searches for specific text on websites that the site administrator defines. If any of those web pages contain hyperlinks, it spiders through those pages, identifying the title, blurb, document type, and locate in a standard easy to read format. Some search engines are capable of searching Boolean expression, phrase, clause etc Lewandowski (2012). not know which search engine is best depending on information they require. Several types of search engines have been designed and implemented based on different retrieval methods, algorithms, and database technique. Hence, majority of the users finds it difficult to know which search engine is best. In this paper, the search engines performance will be evaluated based on retrieval effectiveness using three metrics; precision, recall, and response time.

SCOPE AND LIMITATION
In this research work, we are going to estimate the precision, response time and relative recall of five search engines Google, Yahoo, Bing, Baidu, and AOL. 24 search queries were sampled and these queries where categorized based on; Information, Navigational, and Transactional queries. This work is limited to only the first fifty (50) sites displayed by each search engine. Kumar and Bhadu (2013) this paper compare three Search Engines. The precision and relative recall of each search engine was considered for evaluating the performances of the search engines. Queries were tested, Results of the study showed that Google is the best amongst the search engines used.

RELATED WORKS
Oberoi and Chopra (2009) finds that a web search engine opens the door to explore a huge amount of information. There is a variety of search engines which offer diversified services to its users. This paper draws a clear picture of the differences between various search engines and disproves the notion that all web search engines have same search capability, coverage, ranking and indexing techniques. Web search engines differ from each other in multiple aspects such as the searching strategy, coverage of the web, relevance of the search results with respect to the search query, ranking of the search results etcetera. The overlapping of the search results offered by the search engines is very low. The overlapping of the results from various search engines could be measured by collecting sample URLs from the result set of a search engine for a specific query. URLs from the collected data can then be matched with the results of another engine by performing a string comparison. The number of matches could be recorded to determine the fraction of URL overlap. Search engines with a single source have low web coverage in comparison to a meta-search engine.

METHODOLOGY
Information retrieval systems in general and specifically search engines need to be evaluated during the development processes as well as when the system is running in order to see how effective they are. The primary reason of the evaluations is to improve the quality of searching process and results, although there are other reasons for evaluating search engines. Measures used for search engines evaluation in this research work are:

PRECISION
This is the sum of the scores of sites retrieved by a search engine to the total number of site selected for evaluation Kumar and Bhadu. After the retrieval of search result, the user is sometime able to retrieve relevant information. The quality of searching the right information accurately would be the precision value of the search Kumar and Prakash (2009).
The search results which were retrieved by the search engines used were categorized as relevant, irrelevant, links and site can't be accessed.
If the web page is much close to the subject matter of the search query then it was categorized as relevant and two was given as the score. If the webpage is not related to the subject matter of the search query then it was categorized as irrelevant and zero was given as the score. If a web consists of a whole series of links, rather than the information required, then it was categorized as link and one was given as the score. If a message appears, "site can't be accessed" for a particular website the page it was categorized as site can't be access and zero was given as the score.

RESPONSE TIME
This is the time taken for each query search to be completed by the engines; it can be measured using stop clock or as is displayed by some search engines Kausar, Dhaka and Singh (2013) RELATIVE RECALL This is the ability of a system to retrieve all or most of the relevant documents in the collection. The relative recall can be calculated using the following formulae; Relative recall = .

SEARCH ENGINES
Search engines are programs that search documents on the World Wide Web as requested by the user seeking for information Lewandowski (2008). Five search engines; Google, Yahoo, Bing, AOL, and Baidu were be used to determine the precision, response time and relative call of some sample search queries created for this research. Since more sites would be retrieved from the search engines for each search query, it was decided to select only the first 50 sites displayed by each of the search engines listed above for the evaluation Lewandowski (2013).

SEARCH QUERY
Search query is what a user inputs into a search engine and expects results as information Lewandowski (2009), Al-akashi andInkpen (2012). A total of 24 queries were used for the evaluation, which is based on single word, double word, sentence, and alphanumeric queries and it is categorized into Informational, Navigational and Transactional queries in order to have an optimal results at the end of the research Kumar, Suri and Chauham (2005).       The sample queries were tested using the selected search engines and readings were taken as seen above from Table 2 to Table 6. Table 7 above shows the overall average precision of each search engines as it was computed from tables presented above. The study show Google has the highest precision against the other search engines.   From Table 8 above it shows that Google has the least response time, by that means Google is the fastest in terms of information retrieval.

CONCLUSION
This research paper compares the retrieval effectiveness of some search engines based on; precision, relative recall, and response time. The analysis was carried out within a specific period of time, it was noted that type of query affects the search effectiveness of search engines it was found that Google is best for single word queries, double word queries and alphanumeric queries while Bing is the best for sentence queries. Lowandowski, D., 2013