Chatbot for eCommerce

There are various scenarios that chatbots cover these days, for example, customer support which I wrote an article about it before, you can find it here. In this article, you will learn about how a chatbot reply to a search message.

The use case scenario is when a user is looking for an item and request that via a chatbot on your website or mobile app. Then chatbot parses the message and based on the keyword, reply to the user with a search result in which the user can choose one of the items.

I used Java and ApacheOpenNLP to build this chatbot. In the following steps, you will learn how a chatbot parse a message:

Remove invalid characters from the message

When a user send a message, it might contain some invalid characters. Therefore, we need to remove them to get the actual keywords that help us to reply to the user correctly.

Here is an example of a cleaned up the message

In java you can use this regular expression to remove invalid characters:


OpenNLP Tokenization

Then we need to tokenize the message using OpneNLP Tokenization which is the process of chopping the given sentence into smaller parts (tokens) is known as tokenization. In general, the given raw text is tokenized based on a set of delimiters (mostly whitespaces).

  • Tokenization
  • spell-checking
  • processing searches
  • identifying parts of speech
  • sentence detection
  • document classification of documents

In the following code, we first train the tokenizer, using TokenizerMe and TokenizerModel.

try (InputStream modelIn = new ByteArrayInputStream(Files.readAllBytes(tokeniserTrainingFile.get()))) {
    this.tokenizer = new TokenizerME(new TokenizerModel(modelIn));

TokenizerME − This class converts raw text into separate tokens. It uses Maximum Entropy to make its decisions.

Entropy in machine learning is a measure of uncertainty (1 is completely certain and 0 is completely uncertain).

Then we tokenize the input message:

private tokenizerMe;
final String[] tokenizedMessage = this.tokenizerMe.tokenize(RobotUtil.getOnlyValidCharacters(inputMessage)); 

The following image demonstrates the tokenized message:

After tokenizing the message, we need to detect the type of each token and remove those token that is not helpful. I explained that in the next step.

OpenNLP part of speech


Detect the parts of a given sentence and tag each tag belongs to which type, noun? Verb? Adverb? Adjective?

Here is the code I used:

private POSTaggerME ptagger;
try (InputStream modelIn = new ByteArrayInputStream(Files.readAllBytes(trainingFile.get()))) {
    this.ptagger = new POSTaggerME(new POSModel(modelIn));

POSTaggerME - This class predicts the parts of speech of the given raw text. It uses Maximum Entropy to make its decisions.

final String[] tags = this.ptagger.tag(tokenizedMessage); 

After that we specify the type of each token, we remove those are not necessary

Then we have actual keywords and we could return the relevant result to the user:

Share post:

Follow Us On:

Receive Our News and Announcements

*Stay up to date with events, blogs and announcements.

Copyright ©2020 Taikera.