Tech tutorials Dynamic Language Translation With Chatbots & Azure Cognitive Services
By Insight Editor / 21 Jun 2018 , Updated on 16 May 2019 / Topics: Microsoft Azure Customer experience
By Insight Editor / 21 Jun 2018 , Updated on 16 May 2019 / Topics: Microsoft Azure Customer experience
Chatbots are a hot topic these days. The available tool sets have enabled many businesses to successfully integrate chatbot technology into their enterprise systems, saving time and money.
One of the challenges development teams face when building a new chatbot is how to handle language translation for a globally deployed chatbot. Due to the dynamic aspects of a chat conversation, app localization is difficult, at best.
What a global chatbot needs is a way to detect the incoming language from the user, translate it to the language your bot understands, then translate the bot response to the user’s original input language. All of this needs to happen dynamically and on the fly. The Microsoft Cognitive Services Translator Text API provides the tools to accomplish this.
In this tutorial, we’ll be using the Microsoft Bot Builder framework v3, the Bot Framework Emulator and a Translator Text API resource in Azure. We’ll assemble these resources into a functioning echo bot that will perform language translation on your inputted text. I invite you to clone a copy of the completed sample solution from GitHub to use as a reference.
We’ll start with the Visual Studio bot application template. Create a new project from that template, give it a name and press OK. This will create a basic echo bot that echoes back any text typed in. When you’re done with this tutorial, the updated chatbot application will detect the incoming text language, translate it to English so the bot can process it and, finally, respond in the user’s language.
To access Azure Cognitive Services, you’ll need an active Azure subscription. If you don’t have one, you can get a trial Azure account. Cognitive Services encompasses a wide range of features, including speech, vision and language processing.
To perform language detection and translation, we need to use the Translator Text API endpoint. The Translator Text API can perform automatic language detection, translation, transliteration and bilingual dictionary lookups. It supports more than 60 languages, and Microsoft continues to add more.
To create a translator resource, log in to the Azure portal and click the + Create a Resource button in the top left corner. Search for Translate Text and then click the Translator Text resource type. Finally, click Create.
In the Translator Text Create blade, give your service a name and select the Azure subscription to associate the resource with. Select F0 for the pricing tier. This is a free tier and will allow up to 2 million characters per month to be translated. Create a new or select an existing resource group. For a new resource group, pick East US 2 as the resource group location. Finally, click the Create button, and Azure will create the new API resource.
Once Azure finishes creating the Translator Text resource, go to the resource overview and click on the Keys tab. Here, you’ll find the access keys you need to access to API from your code. Copy either Key 1 or Key 2 and store it somewhere. We’ll use it later in this tutorial.
For this article, we’ll be using three functions of the Translator Text API version 3.0. Version 2.0 is still available, but it’s scheduled to be discontinued in 2019.
This API method returns a list of the currently supported languages by other text API methods. We’ll use this to look up the names and native names of the languages being used.
This API method will take your input text, or an array of input text, and detect the language. If a text element could be more than one language, a score between zero and one is returned for each possible alternative based on the confidence in that result. In this article, we’ll detect the incoming language and store it in the conversation state. We’ll use that information later when building our bot output.
This API method will take your input text, or an array of input text, and translate it to one or more languages. If no source language is provided, the API will attempt to discern the language on its own. The API can also translate well-formed HTML and will correctly ignore tags and other nontext elements. This method is the backbone of our goal and will be used to translate text back and forth between English and the input language.
These functions are all we’ll need to detect and translate between multiple languages. When writing a chatbot that could potentially be used anywhere around the world, the functions will give your app the support it needs to communicate with any user in his or her native language.
The first thing we want to do is add some app settings to the web.config. We want to add settings for the Translator Text API subscription key and the API endpoint. Note, as of this time, the endpoint shown on the Azure portal is for version 2.0 of the API. Go ahead and add the following entries in the appSettings section of web.config. This is the only web.config change we need to make.
<add key="trns:APIKey" value="<REPLACE WITH YOUR SUBSCRIPTION KEY>" />
<add key="trns:APIEndpoint" value="https://api.cognitive.microsofttranslator.com" />
I added a service class, named “LanguageUtilities,” to call the Translator Text API. I also created an interface from the class and will be using dependency injection to inject it into the other parts of the bot that need to use it. For this demo, those will be MessageController.cs, RootDialog.cs and a piece I’ll talk about later. There’s also a read-only property for the default language. This will be “en” for English.
public interface ILanguageUtilities
{
string DefaultLanguage { get; }
Task<T> SupportedLanguagesAsync<T>();
Task<T> DetectInputLanguageAsync<T>(string inputText);
Task<T> TranslateTextAsync<T>(string inputText, string outputLanguage);
}
Let’s take a closer look at the implementation of ILanguageUtilities, LanguageUtilities.cs. You’ll notice there’s a single private method, ExecuteAPI. All of the API methods in the Translator Text API work in a similar manner, so I created ExecuteAPI to act as the single point for all calls to go through.
This method uses generics to define the return type. Each API call returns a JSON object, and this will deserialize it into the generic type. Other than the use of generics, this method call is a straightforward HTTP get or post request.
private async Task<T> ExecuteAPI<T>(string apiPath, string bodyText)
{
string requestBody = String.Empty;
if (!String.IsNullOrEmpty(bodyText))
{
System.Object[] body = new System.Object[] { new { Text = bodyText } };
requestBody = JsonConvert.SerializeObject(body);
}
string apiKey = ConfigurationManager.AppSettings["trns:APIKey"];
string url = ConfigurationManager.AppSettings["trns:APIEndpoint"];
var uri = new Uri($"{url}/{apiPath}");
using (var client = new HttpClient())
using (var request = new HttpRequestMessage())
{
request.Method = !String.IsNullOrEmpty(requestBody) ? HttpMethod.Post : HttpMethod.Get;
request.RequestUri = uri;
request.Content = !String.IsNullOrEmpty(requestBody) ? new StringContent(requestBody, Encoding.UTF8, "application/json") : null;
request.Headers.Add("Ocp-Apim-Subscription-Key", apiKey);
var response = await client.SendAsync(request);
var responseBody = await response.Content.ReadAsStringAsync();
var setting = new JsonSerializerSettings();
setting.StringEscapeHandling = StringEscapeHandling.EscapeNonAscii;
var result = JsonConvert.DeserializeObject<T>(responseBody);
return result;
}
}
The public methods, which are the implementations of the methods of the interface, are all for the specific API calls and only provide the API path to ExecuteAPI and pass on the generic return type information. I generated C# classes to reflect the return values of the different API calls.
When a method is called, the expected return type is set as the type — thus, letting me have one method that can return different types. The website, json2csharp, is a great way to save some time if you want to quickly convert a JSON object to C# classes. Translator Text API methods all share one common query string parameter. This is ‘api-version’ and must be set to 3.0.
public async Task<T> SupportedLanguagesAsync<T>()
{
var path = $"languages?api-version=3.0&scope=translation";
return await ExecuteAPI<T>(path, String.Empty);
}
SupportedLanguagesAsync() is a GET HTTP request and has one additional parameter, ‘scope.’ This defines the group or groups of languages to return. We’re only working with the translation function, so it will be set to ‘scope=translation.’ Other options include ‘transliteration’, ‘dictionary,’ You can leave the scope parameter out, and all three will be returned.
public async Task<T> DetectInputLanguageAsync<T>(string inputText)
{
var path = $"detect?api-version=3.0";
return await ExecuteAPI<T>(path, inputText);
}
DetectInputLanguageAsync() is a POST HTTP request and has no additional parameters. However, since it’s a POST method, it does need body content. The body is a JSON array, and each element in the array is a JSON object with a string property of ‘text.’ The value of the text property is what will be analyzed by the language detection. The JSON body should look something like this:
[{"Text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit."}].
There are some limitations on how big the request can be. Refer to the online API documentation for the specifics on those limitations.
public async Task<T> TranslateTextAsync<T>(string inputText, string outputLanguage)
{
var path = $"translate?api-version=3.0&to={outputLanguage}&includeSentenceLength=true";
return await ExecuteAPI<T>(path, inputText);
}
TranslateTextAsync() is a POST HTTP request. It has several additional query parameters, but only one of them is required. The ‘to’ parameter defines the language to translate text to. The parameter can be repeated to define more than one language to translate text to. You don’t need to supply the language you’re translating from. The translate method will try to detect it for you. If you choose to supply it, the parameter name is ‘from’.
If you need to send a block or page of HTML to the API for translation, you obviously don’t want the entire contents to be translated. That would include the HTML tags, and those results would be unusable. To tell the API you’re sending HTML, you want to use the optional parameter, ‘textType,’ with a value ‘html.’ The other possible value is ‘plain,’ which is the default if you leave this parameter out. There are several other optional parameters — again, refer to the API document to learn about those.
This is also a POST request, so a body must be included. The body for this request is similar to the Detect API method and is a JSON array. Each array element is a JSON object with a string property named ‘Text,’ which represents the string to translate.
Now that we’ve built a library to access the Translator Text API, we can begin looking at how to tie it in with the bot application. One of the key concepts about Microsoft’s bot framework is that a bot application starts as nothing more than a WebAPI controller with a single Post() method. All messages coming into the bot go through this method first, and this gives us the perfect place to capture, detect and translate the incoming text.
In the API controller Post() method, the code takes the activity text and calls the DetectInputLanguageAsync() method to detect the language of the incoming text.
var msgLanguage = await _languageUtilities.DetectInputLanguageAsync<List<AltLanguageDetectResult>>(activity.Text);
Now that we know the incoming language, we need to store it in the conversation state so it will persist from call to call. We’ll do this using a BotDataStore. An in-memory store is used, but you can use any data store.
var userData = await _botDataStore.LoadAsync(key, BotStoreType.BotPrivateConversationData, CancellationToken.None);
var storedLangugageCode = userData.GetProperty<string>("ISOLanguageCode");
storedLangugageCode = storedLangugageCode ?? _languageUtilities.DefaultLanguage;
if (!storedLangugageCode.Equals(outputLanguage))
{
userData.SetProperty("ISOLanguageCode", outputLanguage);
await _botDataStore.SaveAsync(key, BotStoreType.BotPrivateConversationData, userData, CancellationToken.None);
await _botDataStore.FlushAsync(key, CancellationToken.None);
}
Finally, we translate the incoming text to the bot default language, English.
if (!msgLanguage.Equals(_languageUtilities.DefaultLanguage))
translatedObj = await _languageUtilities.TranslateTextAsync<List<AltLanguageTranslateResult>>(activity.Text,
_languageUtilities.DefaultLanguage);
Now, set the original incoming text to the translation text.
activity.Text = translatedObj[0].translations[0].text;
You can see this code is straightforward, but there’s a lot happening. During a single user input, the text language is detected, the information is being stored and the text is being translated to the bot’s default language.
How do you handle the bot response? You could translate all bot responses just prior to calling context.PostAsync(), but then you’d have to find all those places in your code and remember to add it anytime you added new responses. Just like the incoming message from the user, we need to find a single place to handle all messages outgoing from the bot.
Unfortunately, unlike the API controller’s Post() method, nothing in the project solution will provide a single, central place for us to add translation logic. However, there is a way to do it: IMessageActivityMapper. To understand fully what this interface does requires an understanding of the dialog internal logic and how messages are processed from user to bot to user. That’s a bit out of scope for this article.
The short explanation is that when you send a message, it gets processed by multiple implementations of the IBotToUser interface. One of those implementations is MapToChannelData_BotTouser class. This class allows you to implement IMessageActivityMapper, and it will be called before any message is sent to the user. This gives you the opportunity to execute any additional work, such as translating your bot response. This is just what we need.
Look in the Utilities folder again and open TranslatorMessageActiviyMapper. The first thing to notice is that we’re again using dependency injection to inject the language utility and bot datastore instances, but the important thing to look at is the implementation of the Map() method.
public IMessageActivity Map(IMessageActivity message)
{
Task<string> translation = Task<String>.Factory.StartNew(() =>
{
//store key is based on user to bot data. We need to build this a little different
var key = Address.FromActivity(message);
var userKey = new Address(key.UserId, key.ChannelId, key.BotId, key.ConversationId, key.ServiceUrl);
var userData = _botDataStore.LoadAsync(userKey, BotStoreType.BotPrivateConversationData, CancellationToken.None).Result;
var storedLangugageCode = userData.GetProperty<string>("ISOLanguageCode");
storedLangugageCode = storedLangugageCode ?? _languageUtilities.DefaultLanguage;
var translatedText = _languageUtilities.TranslateTextAsync<List<AltLanguageTranslateResult>>(message.Text,storedLangugageCode).Result;
return translatedText[0].translations[0].text;
});
message.Text = translation.Result;
return message;
}
The Map() method is taking the outgoing bot to user message text and translating it to the user language we stored earlier. Note, when we create the storage key, we must rearrange the values. This is because we’re now working with a bot to user activity. In the API controller, we’re dealing with user to bot, so we need to manipulate the Address values to get the right bot data storage key.
Once we have the language to translate to, we take the original message.Text, translate it and then update message.text to the translated text. There it is. We’re now translating all outgoing messages in one central location.
The last thing to look at is RootDialog.cs, located in the Dialogs folder. The MessageReceivedAsync() method does a little more processing of the incoming user text. When you run the bot application in Visual Studio, set a breakpoint on this method and then send something in a language other than English.
When the code stops on your breakpoint, look at the value of activity.Text. You’ll see your input has already been translated to English. Remember, this happens at the API controller Post() method. The code then requests a list of supported languages, so we can get a mapping between the ISO language codes and language names.
Finally, we construct a reply, in English, which is then sent by the bot framework via the context.PostAsync() method. Because we have an implementation of IMessageActivityMapper waiting, this text gets translated to the user’s language before being sent back to the chat client.
private async Task MessageReceivedAsync(IDialogContext context, IAwaitable<object> result)
{
var activity = await result as Activity;
context.PrivateConversationData.TryGetValue<string>("ISOLanguageCode", out string lang);
var languages = await _languageUtilities.SupportedLanguagesAsync<JObject>();
var englishText = $"You sent '{activity.Text}', which was originally written in {languages["translation"][lang]["name"].ToString()}";
await context.PostAsync(englishText);
context.Wait(MessageReceivedAsync);
}
This tutorial is a small taste of what the Translator Text API can do for your bot application. There are many other APIs available in the Azure Cognitive Services suite, and I encourage you to learn about them. The full project is available in GitHub. Feel free to grab a copy and play around with it. Don’t forget you need an Azure account to use Cognitive Services.