Commerce, Managed Services

WebSphere Commerce Solr Integration

Introduction

If you use WebSphere Commerce V7 then you may already use the WebSphere Commerce Solr integration for search that is provided in the product, or you might be thinking about using it. This integration brings together two very complex pieces of software here WebSphere Commerce we know is complex and Solr is an enterprise search.

Understanding the integration, and improving search relevancy for your customers when they are looking for the products that you sell.

Definitions

First, let’s look at just some of the components and terms that you will use when working with the WebSphere Commerce Solr integration.

  • Solr – provides an open source enterprise search environment from the Apache foundation, supporting many features including full text searching, faceted search and rich document support such as word and PDF. It powers some of the largest site’s in the world and many eCommerce vendors integrate with it to provide search functionality.
  • Preprocess – the di-preprocess command will take the WebSphere Commerce data and generate a series of tables that flatten the data data that so it can be indexed by Solr.  A number of preprocess configuration files are provided out of the box and when you run the command you will see a series of tables that start ti-…. will be created in your database instance.  When you become more advanced with Solr you may want to include additional data at pre-process time.
  • di-buildindex – for Solr to run it must have an Solr Index, this is built from the data that was generated when running the pre-process component of WebSphere Commerce.  The index then needs to be kept up to date at various times either through a full build of all the data or a delta build to just pick up changed data.
  • Structured data – the structured data for Commerce is anything from the database so your product information would be part of your structured data.
  • Unstructured data – this would be your PDF’s documents anything not from the database that will be returned in your results.  We won’t really focus on this type of information yet, there is enough to get right with the structured data.
  • Solr document – a document in the Solr index refers to the details on a product / item /category, the document contents are then returned as part of the Solr response.
  • Search term – the search term’s are the words you are looking for within the Solr Index
  • Relevancy Score – this is very important it how Solr has ranked the document when it performs a search against the terms.  That score can be impacted by a wide variety of options both Solr driven but also down to how you have structured the data.  Understanding this score is understanding the results being produced.
  • Schema.xml – the schema.xml file defines the structure of the Solr configuration,  the file can be modified if you want to say add longDescription into your search index which by default is not used. You would also make changes in here if you adjust the configuration components such as the spellchecker.
  • Solr Core – this allows us to have a single Solr instance with multiple Solr configurations and indexes, you will see a ‘default’ core that is available and not used.
  • CatalogEntry Core – the index created that covers everything about the products and items within WebSphere Commerce.
    When a query is created you send it against that index for example


http://<myhostname>:<port if not 80>/solr/MC_10351_CatalogEntry_en_US/select?q*:*

will return information from the entry based index on products and items in there.  You can see from the core name that it’s taking the master CatalogId as an identifier as well as the language.  This means we can have multiple language indexes being used.

  • CatalogGroup Core – the index created that covers information about the categories that are within the store. An example query against the Catalogroup Core

http://<myhostname>:<port if not 80>/solr/MC_10351_CatalogGroup_en_US/select?q*:*

                                              

Pre-processing and index-building

 

The following diagram illustrates the relationship between preprocessing and index building in WebSphere Commerce:

di-preprocess utility

The di-preprocess utility extracts and flattens WebSphere Commerce data and then outputs the data into a set of temporary tables inside the WebSphere Commerce database. The data in the temporary tables is then used by the index building utility to populate the data into search indexes with the Data Import Handler (DIH).

The preprocess utility picks the wc-dataimport-preprocess-fullbuild.xml file or wc-dataimport-preprocess-deltabuild.xml file first, and then transforms the results of the SQL statements defined in those files into temporary tables. Next, the utility handles each configuration XML file in a random order.

Command

./di-preprocess.sh /apps/ecom/commerce/7.0/instances/preview/search/pre-processConfig/MC_10001/DB2 -instance preview -dbuser dbusername -dbuserpwd dbpassword -fullbuild true

 

To get more logging information, update the logging level from INFO to FINEST in the following file:

WC_installdir/instances/instance_name/xml/config/dataimport/logging.properties

# Default global logging level, INFO

.level=FINEST

# Default global logging level, INFO

com.ibm.commerce.level=FINEST

 

di-buildindex utility

 

The di-buildindex utility is a wrapping utility that updates the information in the Master Index with the Data Import Handler (DIH) service to build the index. The information is updated either partially through delta index updates or completely through full index builds.

The DIH uses URLs to call commands for

example, http://host:port/solr/MasterCatalog_CatalogEntry_en_US/dataimport?command=full-import .

The index building utility uses DIH to connect to the WebSphere Commerce database through a JDBC connection. It crawls the temporary tables that are populated by the preprocess utility, and then populates the Solr index.

The wc-data-config.xml configuration file defines the JDBC configuration and SQL crawling statements.

The utility reports the status of the indexing progress based on the statusInterval parameter. By default, every 10 seconds the utility prints how many documents are indexed in each index, how long the utility is running, and the current indexing status. After the utility completes, it reports how many documents were successfully indexed in each index, and which index build failed.

Command

./di-buildindex.sh -instance preview -masterCatalogId 10001 -dbuser dbusername -dbuserpwd dbpassword

 

To get more logging information, update the logging level from INFO to FINEST in the WC_installdir/instances/instance_name/xml/config/dataimport/buildindex-logging.properties file:

# Default global logging level, INFO

.level=FINEST

# Default global logging level, INFO

com.ibm.commerce.level=FINEST

 

Search Setup

Run the search index setup utility

Server using DB2:

./setupSearchIndex.sh -instance InstanceName -action configWCforSolr -masterCatalogId 10001 -dbuser

dbusername -dbuserpwd dbpassword -searchServerName SearchSolrname -searchServerPort 3737  -searchServiceContextRoot /solr

This utility updates the following information,

  1. Wc-search.xml is updated with appropriate Solr configuration information.
  2. One record  will be  inserted into the database table SRCHCONF

 

Main Configuration Files

 

  1. wc-data-config.xml

This file contains queries for getting data from pre-processing tables (regular WC tables if necessary) to populate the search index. The query results are mapped to SOLR fields in this file.

  1. schema.xml

This file defines the schema fields and data type of the fields. This file also defines the SOLR field settings. Searchable, sortable SOLR fields are defined here.

  1. wc-component.xml

The component configuration file (wc-component.xml) contains properties to configure various WebSphere Commerce search application features.

The component configuration file is stored in the following location:

opt/IBM/WCDE_ENT70_New/workspace/WC/xml/config/com.ibm.commerce.search/wc-component.xml

To change the value of an existing property in the WebSphere Commerce search component configuration file, you must create a customized version of the file in this location. The customized version of the file must contain only the changed properties.

  1. Solrconfig.xml

This is the main SOLR configuration file. The conf directory for each Solr core contains the configuration files for the index. The solrconfig.xml file contains Solr configurations options.

  1. wc-search.xml

This file contains the search profiles. Fields to sort by are also defined here.

  1. wc-business-object-mediator.xml

This file has the mapping from SOLR fields to WC fields (usually user data fields are used in the noun).

  1. wc-object-relational-metadata.xml

Object-relational metadata is an XML file that maps physical SDO Java classes to the database schema. The data service layer requires this object-relational metadata to translate between the physical Java classes and the WebSphere Commerce database schema. Each service module has a set of object-relational metadata files that represent the database schema for the component. Metadata for custom tables is stored in WC_eardir\xml\config\servicemodulepackagename-ext\wc-object-relational-metadata.xml.

Search configuration tables

SRCHCONF

Configuration details for each search index .Populated by index creation script

SRCHATTR

Logical names of catalog attributes in the search schema. This table, search attributes, contains the logical name for each catalog attribute in the search schema. There can be multiple fields in the index, such as tokenized and non­tokenized, that represent a single logical field.

SRCHATTRPROP

Properties of each search index attribute. The third table, search attribute properties, contains the usage information for each index field. The usage type for a field can be one or more of display, sort, facet, rank, search, filter, and catalogFilter.

Each row in the table represents one usage type for a field. The search attribute and search attribute properties tables are updated automatically when you mark an attribute dictionary attribute as searchable in Management Center.

WebSphere Commerce search utility exit codes

The search index utilities (di-buildindex, di-preprocess, indexprop, migrateSearchFacet, migrateSolrSearch, setupSearchIndex) return exit codes when the utility is run.

The following tables describe the exit codes that are common to all search utilities and the codes that are specific to particular utilities.

 

Exit code Utility Description
0 All utilities The utility ran successfully.
1 All utilities An error occurred. Check the stack trace for more information.
2 All utilities Mandatory parameters are missing. The utility displays the parameters that are missing in the request.
3 All utilities The utility did not run because another process is running. Ensure that a search utility or scheduled job is not running.
7 indexprop The index propagation for the core contains a different version number than the master index. The replicable index version number is different from the index version number for the master index.

Note

This exit code does not indicate a failure. The replication succeeds, with the exit code produced as a warning.

10 All utilities The database connection is not established. Possible errors are an invalid user ID and password combination, or that the database server is not running.
20 di-preprocess The specified directory for the preprocessing XML file is not found.
21 All utilities The specified value is not a valid. The utility displays the invalid parameters for the request.
22 di-buildindex The inventory index is not set up before an inventory index build. Ensure that the Inventory index is set up correctly.
50 di-preprocess The preprocessing is completed successfully, with no delta entries to process for the CatalogEntry and CatalogGroup index types. A delta index build is not required in this case. This exit code occurs because the skipDeltaNoEntry parameter is set to true.
51 di-preprocess The preprocessing is completed successfully, with no delta entries to process for the CatalogEntry index type. A delta index build is required for the CatalogGroup index type in this case. This exit code occurs because the skipDeltaNoEntry parameter is set to true.
52 di-preprocess The preprocessing is completed successfully, with no delta entries to process for the CatalogGroup index type. A delta index build is required for the CatalogEntry index type in this case. This exit code occurs because the skipDeltaNoEntry parameter is set to true.

TI Tables

 

TI_DELTA_CATENTRY

This table stores the delta changes for catalog entries.

TI_DELTA_CATENTRY

Column Name Column Type Column Description
MASTERCATALOG_ID BIGINT NOT NULL The master catalog ID of the catalog entry.
CATENTRY_ID BIGINT NOT NULL The ID of the changed catalog entry.
ACTION CHAR The action type:

U

Create/update.

D

Delete.

F

Full update is required.

P

Pending (indexing is in progress).

A

Used when a product is previewed.

The product is updated with a U action; then, the U action changes to an A action.

B

The di-preprocess script is completed successfully from the command line. The di-buildindex script can now be run.

X

Marked for cleanup.

LASTUPDATE TIMESTAMP NOT NULL The last modification time of the changed catalog entry.
CONTENT_BASE INTEGER IBM internal use only.

 

 

TI_DELTA_CATGROUP

This table is used to track all search index-related delta change history for catalog groups in a workspace.

TI_DELTA_CATGROUP

Column Name Column Type Column Description
MASTERCATALOG_ID BIGINT NOT NULL The master catalog ID of the catalog group.
CATGROUP_ID BIGINT NOT NULL The ID of the changed catalog group.
ACTION CHAR The action type:

U

Create/update.

D

Delete.

F

Full update is required.

P

Pending (indexing is in progress).

X

Marked for cleanup.

B

The di-preprocess script is completed successfully from the command line. The di-buildindex script can now be run.

LASTUPDATE TIMESTAMP NOT NULL The last modification time of the changed catalog group.
CONTENT_BASE INTEGER IBM internal use only.

Best Regards

Vijaya Harsha.D

About The Author

Leave a Reply

*