Users must log in once before they can be added to wiki spaces.
Skip to end of metadata
Go to start of metadata

In ACP, text searching is an important aspect that requires speed and accuracy in order to satisfy the users. In order to come up with our algorithms, it will take a lot of time and we do not wish to reinvent the wheel. Therefore, we start to look at open-source full-text search tools and we discover that there are a number of it: Apache Lucene, Zettair, Sphinx, SQLite, MSSQL, PostGreSQL, Xapian. We came across a performance analysis during the researches and the result is shown below.

  

Based on the observation made from analyzing the results, these are the conclusion made on the Lucene library:

  1. Lucene has the smallest index size and the best query relevancy that is one of the important factor to ACP. Having high relevancy for each query ensures user to be satisfy with the suggestion list that will display with the desired suggestion located at the top of the list.
  2. Its worst search time is 1.366s across 300MB of text data, which is acceptable under most conditions for ACP. 
  3. Although the index time is slow in above analysis, but the document that will be handled by ACP will not be so huge. 

Other database full-text search tools were not considered as we do not want to install extra tools on user's machine. Therefore, we decided to utilize Apache Lucene for our search.

However, there are other libraries that use Apache Lucene as the full-text search engine and provide more features than using Apache Lucene only. Hibernate is one of the popular library with its full-text search engine (Hibernate Search) built on top of Apache Lucene. Below are the comparison between Hibernate Search and Apache Lucene with conventional JDBC. 

Type of Comparison

Hibernate Search

Lucene with conventional JDBC

Ease of Mapping Database Table

Does mapping through annotating the entities and changes to table structures can be edited easily if field name changes. 

Does mapping through SQL statement and changes to table structures must reflect on all SQL statement. May be prone to error.

Mapping entity to Multiple Database Tables

Each entity can be map to only one database table.

Each entity can be map to multiple database tables.

Mapping of Data to Object

Automated conversion from data to object with Hibernate handling the conversion of data type.

Manual conversion through developer's code and is prone to error in data type.

Ease of changing Database

Edit configuration files and annotations in entities only.

May cause a lot of changes in database access code to support the new database .

Query Language

Support Hibernate Query Language (HQL) that are database independent. Hibernate converts HQL to optimized SQL for data manipulation.

Support only SQL that are database dependent. SQL statement created by developer may not be optimize.

Caching

Set required configuration only.

Maintained manually by developer's coding.

Size of Libraries

Consist of many Hibernate classes that are not use in ACP which causes overhead.

Small libraries size.

In each comparison above, the one highlighted with green is the better choice of the two comparison. From the table, we can see that Hibernate Search has much more advantages than just using Apache Lucene with JDBC as most of the complex and error prone tasks in ACP such as manual conversion from data to usable object, retrieve of data through caching, and optimize of queries to database are handle by Hibernate Search. It motivates a major portion of data manipulation in ACP while providing the flexibility to configure the different functionality such as the cache setting and index configuration. Although there are some disadvantages, Hibernate Search is chosen since the disadvantages such as bigger library size and restriction of mapping one entity to only one database table does not have huge impact to ACP and the advantages are more significance.

  • No labels