Functional Programming and Lucene Query Expansion

Last week I had a programming problem, I had to do a query expansion in Lucene. This particular expansion was to convert term and phrase queries into span queries to provide higher relevancy scores for results when the search terms were closer together.

I simplified the problem by only considering terms and phrases and boolean combinations of them. Basically:

BooleanQueries would be handled recursive following the above convention. A BooleanQuery is essentially a list of BooleanClauses, so at this point those functional programming classes from school popped in my head. I have lists, recursion, base cases, it should be straightforward to jot down a solution in Haskell then I’ll translate it to Java. After a while I gave up and wrote a long piece of procedural code in Java, I just didn’t know Haskell well enough to write out anything more than trivial code.

Eventually the algorithm was refined to:

SpanQuery spanize(Query query, int slop) {


  • If the query is a SpanQuery, return it
  • If the query is a TermQuery, return a SpanTermQuery
  • If the query is a PhraseQuery, return a SpanNearQuery
  • If the query is a BooleanQuery

    1. Divide the clauses into three separate lists, required, prohibited and neither
    2. Before adding the query (from the clause) into the list, recursively call spanize with it as the argument
    3. Convert the required list to a SpanNearQuery
    4. For each entry in the neither set combined it with the above SpanNearQuery into another SpanNearQuery
    5. Combine all those SpanNearQueries into a SpanOrQuery
    6. Combine the above SpanOrQuery with the values in the prohibited list into a SpanNotQuery
    7. Return this SpanQuery

  • If it’s a different type of query, discard it
}

The logic was slightly awkward because I had a mental block of boolean queries being composed of two parts and have a huge recursive structure instead of an arbitrary lengthed list. Also that prohibited clauses only restricted search results instead of possibly generating more.

e.g. ‘apples OR NOT apples’ returns nothing instead of everything.


Originally I wanted to convert the algorithm to Haskell and post an article on how functional languages can make programming easier, but after beginning to convert Lucene’s type hierarchy into Haskell classes I realised I still didn’t know enough Haskell and the jist of the article would be a lie.

Either way I’ve decided to add Haskell back into my skillset. Learning new languages is a great way to become a better developer, as stated in one of my favourite books:

“Learn at least one new [programming] language every year. Different languages solve the same problems in different ways. By learning several different approaches, you can help broaden your thinking and avoid getting stuck in a rut.”

— The Pragmatic Programmer

It looks like it’s too late to learn Haskell, but the Language of the Year group looks like an interesting place to learn a new language with like minded people.

Spread the word: Technorati related  |  Technorati related  |  del.icio.us bookmark it!  |  submit Functional Programming and Lucene Query Expansion digg.com digg it!  |  reddit reddit!

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>