When I first read about the Clustering, I was confused about its utility and ability to work on limited set of search results. But over the period time, after reading lot of research material on search usability and taking to people, I realized that searcher do not go beyond few pages. In fact the study shows that 2/3 of searcher do not go beyond 2 pages of 10 results each. The searcher either find the information that they are looking for or change the search terms. I personally do not go beyond 2-3 pages unless I am not able to refined search phrase. Even with 2 pages of result set, users have filter and infer the context of search results to find for the relevant information that they are looking for. So actually the users are spending more time on search result windows rather than actually working on searched information. When I am searching, it is not only important how relevant the search results are, but also how much time it took to me to get to those results. Here I see value of clustering with search engines.
Search technology has been evolving and maturing over last few years. The search companies are completing with each other for creating a niche for themselves and attracting more traffic to get benefit from advertising, but ever increasing demand of end users are always overtaking them from behind. The users are getting interested in federated search, clustering and faceted search apart from the regular sequential search. Google web search provides clustering as indented search results when it find more search results from same site or site section. Some other search engines like Vivisimo Velocity provide Topic Clustering, or grouping results into topics/subjects, that help in refine searches.
Can we use clustering for anything else? We can use clustering to build applications like job sites, event sites, social networking etc. Also we can generate tag cloud to guide users navigate through popular topics. The list can grow as we see more need for applications.
I was working with Nutch and Google search appliance, when I got interested in the clustering and faceted search technologies. I wanted to integrate the clustering technology with both these search engines to show my clients the value of clustering technology. I gave them demo with Clusty web search and they see lot of value. But question was, can I integrate it with Google search appliance and is there any free or low cost tool that can provide the same functionality. I knew about Vivisimo, but client wanted a cheaper solution, so I started digging more in the open source arena.
Then I came across Carrot2, Open Source Search Results Clustering Engine. Search Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI, MSNAPI, eTools Meta Search, Alexa Web Search, PubMed, OpenSearch, Lucene index, SOLR), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks. Carrot2 has been successfully used in a number of commercial and research applications and resulted in a number of interesting publications.
I found this tool interesting and wanted to use it with my existing search engines. The tool provides seamless integration with nutch and lucene search engines. The developer only needs to point to existing search indexes and customize the page layout. The carrot application is deployed as webapp which you can be drop in any web application server. The deployment was cake walk and results were fascinating. Then I also got it working with the Google search appliance. Here I had to use its Java APIs to build a clustered search interface from the results of Google appliance.
If anyone is looking for low cost clustering solution, one can use open source clustering engine which can be integrated with any existing web search engines and also with enterprise search engines like Google search appliance, lucene index (one can debate on its enterprise capability). It is easy to deploy and configure and does not impose any extra baggage.
I am sure Google must be working on this technology and would come with a solution which is a notch better than Vivisimo or Carrot2. I am eagerly waiting to hear from them.
Showing posts with label Vivisimo Velocity Search. Show all posts
Showing posts with label Vivisimo Velocity Search. Show all posts
Wednesday, August 8, 2007
Friday, August 3, 2007
Enterprise Search Done Right
When I was doing an assignment with a leading energy research company on knowledge management assessment, I learnt how important a search could be for the business and how much a company could go wrong in implementing a search solution. The key to success in research and consulting organizations is effective implementation of knowledge management practice and search plays a important role in it. Enterprise search has direct impact on the business and productivity of employees in these type of organization.
When I were there, I looked at their infrastructure and their business processes, and tried to assess the maturity level of their knowledge management program. The results were appalling. They did not have a knowledge management program. They did not have enterprise content and collaboration management system. They did not have internal search. Their external search engine despite of having two searches was redundant. They did not understand the significance of search for their employees as well as business with respect to productivity and knowledge access. The researcher procured information from various knowledge databases and repositories through a library services. There were not processes in place to measure the effectiveness and efficiency of these library services. The researchers and scientist were so dependent on Google and Yahoo's for information related to their work.
This is not the only story of, how organizations can go wrong in understanding the user needs and implementing ineffective search solutions. I have browsed through so many organization's websites that have great products and services but lack in providing effective search solutions for their users. I do and do not blame them. There is so much work done on external search engines like Google, Yahoo that the expectation of end users have gone up for all kind of searches. When they browse organization search sites, they expect similar user experience and relevance as they get on external web search engines. I do not mean to say that there aren't enterprise search engines who provide similar relevance and experience as external search engines. There are search products from Fast, Verity and Autonomy that leaders in enterprise search.
If I need to suggest a tool that would solve the research organization's search problem, first I would try to understand their business and user needs. Then I would list the features that I need in my product. Lets drill down on the requirements:
1. Ability to search all my repositories including websites, file systems, databases, enterprise applications like SharePoint, Documentum etc. They researcher will be able to access of information including past work done as well knowledge acquired from various sources.
2. Ability to search on external Corporate and academic libraries, journals, feeds etc. It is also called federated search. They do not need to go to various search engines to get information.
3. Ability to classify my documents and content in easy to browse categories. This would help me drill down on the information based on categories rather than pages with 10 results. No one actually browse beyond couple of pages on search.
4. Web administrative interface with advance linguistic capabilities including metadata, synonyms, antonyms, relative weighting for text fields and stemming.
5. Ability to perform secure searches.
6. Ability to scale and perform.
The rest of requirements are generic like relevance, supporting formats, summary extraction, metadata indexing and crawling configuration. If I have a tool that provides solution to all the requirements, certainly it is a candidate for my approval.
When I did the research, I found Vivísimo Velocity Search Engine as one of most powerful enterprise search that satisfy my set of requirements and a great relevance algorithm as compared to Google/Yahoo. There were other who did come close but not close enough. They were either not able to satisfy all requirements or not packaged as single solution.
Vivísimo, a difficult word to pronounce, provides innovative search solution that not only provides same search relevance as compared to the external web search engines but also provides similar user experience. You can access their public web search at http://vivisimo.com/ to get a glimpse. Their clustering solution is also popularly known as Clusty.
Integral components of the Vivísimo Velocity Search Platform:
The external web search of Vivísimo looks impressive and I just hope the enterprise search is as promising as their web search. I think anyone looking for enterprise search engine should take a look at their offering.
I have done lot of research of federated search and clustering technology which are commercial as well open source. I will write on these technologies in coming posts.
When I were there, I looked at their infrastructure and their business processes, and tried to assess the maturity level of their knowledge management program. The results were appalling. They did not have a knowledge management program. They did not have enterprise content and collaboration management system. They did not have internal search. Their external search engine despite of having two searches was redundant. They did not understand the significance of search for their employees as well as business with respect to productivity and knowledge access. The researcher procured information from various knowledge databases and repositories through a library services. There were not processes in place to measure the effectiveness and efficiency of these library services. The researchers and scientist were so dependent on Google and Yahoo's for information related to their work.
This is not the only story of, how organizations can go wrong in understanding the user needs and implementing ineffective search solutions. I have browsed through so many organization's websites that have great products and services but lack in providing effective search solutions for their users. I do and do not blame them. There is so much work done on external search engines like Google, Yahoo that the expectation of end users have gone up for all kind of searches. When they browse organization search sites, they expect similar user experience and relevance as they get on external web search engines. I do not mean to say that there aren't enterprise search engines who provide similar relevance and experience as external search engines. There are search products from Fast, Verity and Autonomy that leaders in enterprise search.
If I need to suggest a tool that would solve the research organization's search problem, first I would try to understand their business and user needs. Then I would list the features that I need in my product. Lets drill down on the requirements:
1. Ability to search all my repositories including websites, file systems, databases, enterprise applications like SharePoint, Documentum etc. They researcher will be able to access of information including past work done as well knowledge acquired from various sources.
2. Ability to search on external Corporate and academic libraries, journals, feeds etc. It is also called federated search. They do not need to go to various search engines to get information.
3. Ability to classify my documents and content in easy to browse categories. This would help me drill down on the information based on categories rather than pages with 10 results. No one actually browse beyond couple of pages on search.
4. Web administrative interface with advance linguistic capabilities including metadata, synonyms, antonyms, relative weighting for text fields and stemming.
5. Ability to perform secure searches.
6. Ability to scale and perform.
The rest of requirements are generic like relevance, supporting formats, summary extraction, metadata indexing and crawling configuration. If I have a tool that provides solution to all the requirements, certainly it is a candidate for my approval.
When I did the research, I found Vivísimo Velocity Search Engine as one of most powerful enterprise search that satisfy my set of requirements and a great relevance algorithm as compared to Google/Yahoo. There were other who did come close but not close enough. They were either not able to satisfy all requirements or not packaged as single solution.
Vivísimo, a difficult word to pronounce, provides innovative search solution that not only provides same search relevance as compared to the external web search engines but also provides similar user experience. You can access their public web search at http://vivisimo.com/ to get a glimpse. Their clustering solution is also popularly known as Clusty.
Integral components of the Vivísimo Velocity Search Platform:
- Velocity Search Engine
- Velocity Content Integrator
- Velocity Clustering Engine
The external web search of Vivísimo looks impressive and I just hope the enterprise search is as promising as their web search. I think anyone looking for enterprise search engine should take a look at their offering.
I have done lot of research of federated search and clustering technology which are commercial as well open source. I will write on these technologies in coming posts.
Subscribe to:
Posts (Atom)