Enterprise Search

Solr Json facet API

What is Json facet API?

Json facet API is new approach introduced in Solr 5.x to facet over large sets of documents to obtain analytics.
This API is watershed between Solr 4.x and Solr 5.x versions. Prior to Solr 5.x statistics/analytics capability was done using stats & pivoting components.

Use case

Using stats component and faceting can be kludgy and inefficient.

What I mean by this is, for example, consider following json that shows analytics on currency and electronics categories.

The above json is showing breakdown by each category and sub-facet by inStock and obtain price statistics such as min/max.

Old approach (pre- Solr 5.x)

We could use Solr’s statspivoting component to achieve this.

Problem

Depending on the number of documents, the above query will be in the order of seconds, which does not help many use cases. For web reporting application that respond to user’s clicks, this sort of latency in seconds may be unacceptable. Also the query itself is in url format with several tags interleaved.

New approach (Solr 5.x or greater)

This is where Json facet API comes in handy. The above facet-pivoting query can be rewritten using this new API as follows.

Find top 5 categories and return number of products in-stock and out-of-stock

Similarly, to Find top 5 categories and return manufacturers under each category just change field:manu

 

 Advantages

  1. The QTime for this query, even for millions of records would be in milliseconds. There is an interesting analysis.
  2. The query formation is quite intuitive, because json structure provides clarity when compared to local tags used in facet-pivoting approach.
  3. Applications can programmatically manipulate json.facet structure at runtime, possibly responding to user clicks.
    This much better cleaner approach than manipulating url at runtime.
  4. Each json.facet structure can be logically maintained per use case, separate from the Solr query and combined together at runtime.
  5. For Java based API implementations, SolrJ client library has great support in SolrCloud mode as well.

Limitations

The only caveat is, if cardinality of field values (or unique values in each field) exceeds 100 per shard, then estimation algorithm kicks in. There is already a Jira for this issue that could be addressed in future releases.

Update: There is a patch committed for Jira that should resolve cardinality issue. Perhaps this would rollout in future versions of Solr.

Conclusion

Json facet API provides analytics in sub-second response times to complex queries involving multiple facets. This API provides foundation to build scalable & resilient applications in business analytics and machine learning domains apart from the usual enterprise search.

About The Author

Leave a Reply

*