Building an Amazon Alexa Skill with .NET, C#, and Azure

09/17/2017

Building an Alexa Skill with .NET, C#, and Azure

As of this posts writing (September 2017), there are more than 20,000 custom skills developed for Alexa and available in the Amazon skill store. That number has grown up from 15,000 skills just six months prior to this post (when I built the skill referenced throughout this article). Alexa and the ecosystem continue to evolve at a rapid rate, and the tools and templates available to build Alexa skills also continue to evolve to make the platform very developer friendly. As someone who has worked primarily with Microsoft technologies over the past year, I was looking to expand on some areas of .NET and familiarize myself with voice skills and the Alexa platform, all while attempting to limit the number of unknowns when learning a new technology. As a result, I decided to build and deploy an Alexa Skill with .NET 4.5, C#, and Microsoft Azure.

Amazon provides many out of the box conveniences when building a skill with node, python, or java, and deploying to AWS, and recent tools provide bootstrap solutions and templates. Using .NET for Alexa requires a bit of extra overhead, and this post walks through the buildout of a skill and its deployment.

The following article outlines the steps to building an Alexa Skill using .NET 4.5, MVC 5, and C# and deploying and hosting it on Microsoft Azure. This article is broken out into the following main sections:

Part I - Background of the Alexa Skill

Part II - Setting Up a Local Visual Studio Workflow for an Alexa Skill

Part III - Setting Up The Intent Schema and Amazon Developer Portal

Part IV - Processing Intents and Skill Logic

Part V - Skill Deployment to Azure

It is important to note that this is article is meant to be a general guide. This application is meant to be a demo built with the purpose of experimenting on a few different design patterns, .NET specifics, and applying those patterns into a new (and fun) technology. Before getting started I highly recommend the following resources, which are the building blocks for understanding Alexa Skills on .NET and form the base for the application referenced below.

Pluralsight: Developing Alexa Skills using .NET

AreYouFreeBusy: .NET ALexa Skill

The complete source code for the application can be seen HERE

The Alexa Skill (Grammar Tool) can be seen and downloaded HERE


Part I - Background of the Alexa Skill

Before walking through the demo app, it is helpful to have a general understanding of the Skill in question. If you have ever taken the SAT or ACT, you may remember the fun of the Verbal section and taking “vocab” quizzes throughout high school. The ‘Grammar Tool’ Alexa skill is intended to help relive those fun days and help you learn new words. The skill lets a user ask for a ‘Word of The Day’, and the application pulls from a database of approximately 200 common SAT Verbal words. The word, its part of speech, and an example are given to the user. The user is then prompted to repeat the word to Alexa, and the skill then gives confirmation to the user that this input is correct and asks if they would like to continue with a new word. It is a basic skill, but something that is useful and also gives an interactive dialogue.

At a high level, Alexa Skills are custom APIs that are configured to receive specific phrases and sayings and then output a formatted JSON response which is interpreted into verbal output by the Alexa voice device. As developers, we setup the API and workflows that enable some sort of request/response cycle that engages the user and can be interpreted per the Alexa API specifications. Now that we have an idea as to what we will be walking through, let’s get started.


Part II - Setting Up a Local Visual Studio Workflow for an Alexa Skill

Getting started we create an empty web application. This project uses .NET 4.5.2 and MVC 5. We start by using the ASP.NET Web API template with individual user account authentication enabled. (Select ASP.NET Web Application (.NET framework)). This gives an out of the box solution for signin/signout, which is used in this application in a front end web UI by an administrator.

With the MVC application setup, we now add an empty Web API controller to handle Alexa requests.

AlexaRequests will come in as a POST to our endpoint. This controller will receive defined AlexaRequest inputs and will return defined AlexaResponse objects. The controller method looks like:

// AlexaSkillProject.WebApp/Controllers/AlexaController.cs

[HttpPost, Route("api/v1/alexa/grammartool")]
public dynamic WordOfTheDay(AlexaRequestPayload alexaRequestInput)
{
    return _alexaRequestService.ProcessAlexaRequest(alexaRequestInput); 
}

Using C# also gives the benefit of using strongly typed models for AlexaRequest and AlexaResponse. These classes are defined in a new class library (AlexaSkillProject.Domain) that is pulled in by the Web Application. These classes model the JSON input and output sent and received by Alexa. The specific fields will be covered in the next section.

Now we have a WebApplication setup, an API route and controller method in place, and a class library with domain models for the input and output request in our local environment. Hitting this API in development mode can be expedited by using the Swagger nuget package for help in debugging and developing. Swagger gives debugging routes and the ability to send custom JSON to our API routes. It can be installed by adding the Swagger nuget package. You will see the App_Start/SwaggerConfig.cs file is automatically added, and now when visiting your localhost site, you can append /swagger at the end to see a list of routes. Now you will be able to hit each route and define breakpoints within Visual Studio. We will see more of the custom JSON input and modeling in the Intent Schema setup section.

The last step to both your local environment and migrating to production is security and authorization. Because this skill is hosted outside of AWS using the Alexa templates, the below steps must be taken in order to secure the Alexa Skill and its API endpoints. These steps all relate to accommodating for the requirements Amazon outlines here when hosting a web service outside of AWS:

https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/developing-an-alexa-skill-as-a-web-service

This is an important section, and for this application I drew on pieces from the earlier mentioned resources from Pluralsight and AreYourFreeBusy. In the application, I added a class library AlexaSkillProject.Services to hold shared classes and functionality that is pulled in as a part of the application. Services in this directory refer to classes and their names (not services in a microservice sense as this application is compiled and deployed as a monolith). One class in particular is built to handle authentication outlined in the AlexaSkillProject.Services/AlexaRequestValidation directory. The interface IAlexaRequestValidationService has a method responsible for validating each request.

We can see that is is implemented in the AlexaRequestValidationService.cs class and is called in the initial entry point covered later:

/// <summary>
/// This class verifies that each call to the alexa app has valida data for the timestamp and alexa app id
/// This class ensures no posts carry on for > 150 seconds and that requests are for this application specifically
/// </summary>
public class AlexaRequestValidationService : IAlexaRequestValidationService
    {

        public SpeechletRequestValidationResult ValidateAlexaRequest(AlexaRequestPayload alexaRequest)
        {
            SpeechletRequestValidationResult validationResult = SpeechletRequestValidationResult.OK;

            if (!ConfigurationSettings.AppSettings["Mode"].Equals("Debug"))
            {
                // check timestamp
                if (!VerifyRequestTimestamp(alexaRequest, DateTime.UtcNow))
                {
                    validationResult = SpeechletRequestValidationResult.InvalidTimestamp;
                    throw new Exception(validationResult.ToString());
                }

                // check app id
                if (!VerifyApplicationIdHeader(alexaRequest))
                {
                    validationResult = SpeechletRequestValidationResult.InvalidAppId;
                    throw new Exception(validationResult.ToString());
                }

            }

            return validationResult;

        }


        private bool VerifyRequestTimestamp(AlexaRequestPayload alexaRequest, DateTime referenceTimeUtc)
        {
            // verify timestamp is within tolerance
            var diff = referenceTimeUtc - alexaRequest.Request.Timestamp;
            return (Math.Abs((decimal)diff.TotalSeconds) <= AlexaSdk.TIMESTAMP_TOLERANCE_SEC);
        }

        private bool VerifyApplicationIdHeader(AlexaRequestPayload alexaRequest)
        {
            string alexaApplicationId = ConfigurationSettings.AppSettings["AlexaApplicationId"];

            return alexaRequest.Session.Application.ApplicationId.Equals(alexaApplicationId);   
        }
    }
// AlexaRequestService.cs

// ...

SpeechletRequestValidationResult validationResult = _alexaRequestValidationService.ValidateAlexaRequest(alexaRequestPayload);

// ...

The implementation class shows that each AlexaRequest has a valid Timestamp and is specific to this application (outlined in Amazon’s documentation).

There are also two more Validation and Verification classes:

The AlexaRequestValidationHandler class implements the DelegatingHandler and itercepts the response as outlined here: https://stackoverflow.com/questions/11970313/delegatinghandler-for-response-in-webapi

The override method then proceeds to verify the request headers and the request signature.

The AlexaRequestSignatureVerifierService is a static class referenced in this override method and makes use of the methods and helpers from AreYourFreeBusy. All of these together meet the documentation requirements here: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/developing-an-alexa-skill-as-a-web-service.

One thing to note is that while this is all valid in production, I wanted a way to use the swagger route to submit and debug requests in my local environment. Since there will not be a valid Alexa request header or certificate in these requests, I added a configuration variable to ensure that requests in my local environment are not routed through the above methods.

And now, with the Handler and override in place, the development environment is fully setup and we can begin shaping the JSON input and output into a unique voice application.


Part III - Setting Up The Intent Schema and Amazon Developer Portal

Amazon Alexa provides the functionality to shape voice requests to a skill into a JSON payload. To respond to a user, a JSON payload must also be provided. The earlier mentioned classes of AlexaRequest and AlexaResponse provide the ability to build and deserialize these requests (using C#) into appropriate output. The logic that defines how incoming requests are built is defined by us as developers in the Intent Schema. There are many great resources on building an intent schema HERE from Amazon. They also recently rolled out a more streamlined GUI to help build the schema as well.

In this application, the schema is maintained under source control in the AlexaSkillProject.WebApp/SpeechAssets directory. This schema is saved and outlined in the Amazon developer portal. The idea is that we define different ‘intents’ and add in optional slots. Each intent is then mapped with different ‘Utterances’, which are the phrases a user would say to the device. In this Grammar Tool application, the user will start by saying “What is the word of the day?” This is mapped to the ‘WordOfTheDayIntent’. Once the user receives the word they will eventually be prompted to say it back to Alexa in which case they would say something like ‘The word is {WordVariable}.’ Because this is a predefined response, this intent matches to the ‘SayWordIntent’, which also has a slot that corresponds to the list of possible words Alexa would give the user.

With the intents in place, now we turn back to the AlexaRequestPayload domain object. Here we can see the IntentAttributes which consist of the name and slots, as well as the GetSlot() method. These will be mapped automatically by the Newtonsoft nuget package, and then the object attributes can be used in the program such as the AlexaSkillProject.Services/PersistenceAndMapping/AlexaRequestMapper.cs which saves each incoming request into an AlexaRequest object (which is saved into a SQLServer database and mapped with EntityFramework code first).

With our development environment setup with swagger, our Alexa security handler and validation in place, and our strongly typed classes to receive and serialize the JSON input, we can do live debugging using the Alexa Skill developer console in combination with swagger to simulate a live device.

Using Visual studio, we can open the web application in debug mode, and browse to localhost:PORT/swagger and click to the appropriate API. Flipping back to the Alexa Developer Portal, we can type in example requests, and see the JSON input. We can copy that input directly into the swagger input screen and submit it. Adding a breakpoint into our route, we can now see the incoming response and all of its attributes.


Part IV - Processing Intents and Skill Logic

With the full development environment setup, and some sample intents and requests modeled out, we can dive into the processing and handling of each request. In this application, we can see the alexa route calls the entry point service interface IAlexaRequestService seen here: AlexaSkillProject.WebApp/Controllers/AlexaController.cs

This service and its implementation live in the class library AlexaSkillProject.Services at AlexaSkillProject.Services/AlexaEntryPoint/AlexaRequestService.cs. This class serves as the entry point for each request and handles the processing and coordination of each request. As seen in the class, the request is first verified, mapped into an object (AlexaRequest and AlexaMember), and then persisted to and saved to SQLServer. To then handle each request, we use the strategy pattern and a factory to build an appropriate strategy which in turn handles and processes the request into an AlexaResponse object complete with all details needed for verbal output and a UI card seen in the Alexa phone app.

using AlexaSkillProject.Domain;
using System;

namespace AlexaSkillProject.Services
{
    /// <summary>
    /// This class is the main entry point and serves as the wrapper service for each AlexaRequest
    /// AlexaRequests come through the Web API AlexaController (in the WebApp project)
    /// </summary>
    public class AlexaRequestService : IAlexaRequestService
    {
        private readonly IAlexaRequestMapper _alexaRequestMapper;
        private readonly IAlexaRequestPersistenceService _alexaRequestPersistenceService;
        private readonly IAlexaRequestHandlerStrategyFactory _alexaRequestHandlerStrategyFactory;
        private readonly IAlexaRequestValidationService _alexaRequestValidationService;


        public AlexaRequestService(
            IAlexaRequestMapper alexaRequestMapper, 
            IAlexaRequestPersistenceService alexaRequestPersistenceService,
            IAlexaRequestHandlerStrategyFactory alexaRequestHandlerStrategyFactory,
            IAlexaRequestValidationService alexaRequestValidationService
        )
        {
            _alexaRequestMapper = alexaRequestMapper;
            _alexaRequestPersistenceService = alexaRequestPersistenceService;
            _alexaRequestHandlerStrategyFactory = alexaRequestHandlerStrategyFactory;
            _alexaRequestValidationService = alexaRequestValidationService;
        }

        public AlexaResponse ProcessAlexaRequest(AlexaRequestPayload alexaRequestPayload)
        {
            // validate request time stamp and app id
            // note that there is custom validation in the AlexaRequestValidationHandler
            SpeechletRequestValidationResult validationResult = _alexaRequestValidationService.ValidateAlexaRequest(alexaRequestPayload);

            if (validationResult == SpeechletRequestValidationResult.OK)
            {
                try
                {
                    // transform request
                    AlexaRequest alexaRequest = _alexaRequestMapper.MapAlexaRequest(alexaRequestPayload);

                    // persist request and member
                    _alexaRequestPersistenceService.PersistAlexaRequestAndMember(alexaRequest);

                    // create a request handler strategy from the alexarequest
                    IAlexaRequestHandlerStrategy alexaRequestHandlerStrategy = _alexaRequestHandlerStrategyFactory.CreateAlexaRequestHandlerStrategy(alexaRequestPayload);

                    // use the handlerstrategy to process the request and generate a response
                    AlexaResponse alexaResponse = alexaRequestHandlerStrategy.HandleAlexaRequest(alexaRequestPayload);

                    // return response
                    return alexaResponse;
                }
                catch (Exception exception)
                {
                    // todo: log the error
                    return new AlexaResponse("There was an error " + exception.Message);
                }
            }

            return null;
        }

    }
}

As a side note, this appliction uses Entity Framework code first with SQLServer, and also implements a UnitOfWork pattern on top of an AbstractGenericRepository. This decision was made from my desire to experiment with different patterns. Since EntityFramework provides a unit of work with its DbContext, the AlexaSkill.Repository could be simplified to use out of the box EntityFramework DbContext.

Visiting the AlexaRequestHandlerStrategyFactory.cs, we can see that the factory is responsible for returning an AlexaRequestHandlerStrategy based on the incoming AlexaRequestPayload’s Request Type. In this application, Unity is used as a Inversion of Control container and used throughout for Dependency Injection. The Unity container is resolved and all interfaces and their implementations are registered with the container within the WebApp at AlexaSkillProject.WebApp/UnityConfig.cs. For the Strategy pattern, each class is registered with the container, and each strategy provides a SupportedRequestIntentName which is used when selecting the strategy for the request.

// AlexaSkillProject.WebApp/UnityConfig.cs

container.RegisterType<IAlexaRequestHandlerStrategy, AnotherWordIntentHandlerStrategy>("AnotherWordIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, CancelIntentHandlerStrategy>("CancelIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, StopIntentHandlerStrategy>("StopIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, HelloWorldIntentHandlerStrategy>("HelloWorldIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, HelpIntentHandlerStrategy>("HelpIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, LaunchRequestHandlerStrategy>("LaunchRequestHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, SayWordIntentHandlerStrategy>("SayWordIntentHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, SessionEndedRequestHandlerStrategy>("SessionEndedRequestHandlerStrategy");
container.RegisterType<IAlexaRequestHandlerStrategy, WordOfTheDayIntentHandlerStrategy>("WordOfTheDayIntentHandlerStrategy");

container.RegisterType<IEnumerable<IAlexaRequestHandlerStrategy>, IAlexaRequestHandlerStrategy[]>();
container.RegisterType<IAlexaRequestHandlerStrategyFactory, AlexaRequestHandlerStrategyFactory>();
namespace AlexaSkillProject.Services
{
    /// <summary>
    /// This class returns the correct handler strategy to process the request
    /// Available strategies are initialized in the UnityConfig
    /// </summary>
    public class AlexaRequestHandlerStrategyFactory : IAlexaRequestHandlerStrategyFactory
    {
        private readonly IEnumerable<IAlexaRequestHandlerStrategy> _availableStrategies;

        public AlexaRequestHandlerStrategyFactory(IEnumerable<IAlexaRequestHandlerStrategy> availableStrategies)
        {
            _availableStrategies = availableStrategies;
        }

        public IAlexaRequestHandlerStrategy CreateAlexaRequestHandlerStrategy(AlexaRequestPayload alexaRequest)
        {

            switch (alexaRequest.Request.Type)
            {
                case "LaunchRequest":
                case "SessionEndedRequest":
                    IAlexaRequestHandlerStrategy strategy = _availableStrategies
                        .FirstOrDefault(s => s.SupportedRequestType == alexaRequest.Request.Type);
                    return strategy;
                case "IntentRequest":
                    IAlexaRequestHandlerStrategy intentStrategy = _availableStrategies
                        .FirstOrDefault(s => s.SupportedRequestIntentName == alexaRequest.Request.Intent.Name);
                    return intentStrategy;

                default:
                    throw new NotImplementedException();
            }

        }
    }
}

Two of the most common requests made in this application are the ‘WordOfTheDayIntent’ and ‘SayWordIntent’. These would happen in succession in the application - once when the user asks Alexa for the word of the day, and then again when the user repeats that word back to Alexa. The ‘WordOfTheDay’ and the ‘AnotherWordIntent’ are nearly identical, and as a result both implement the AbstractWordIntentHandlerStrategy. Looking at this class, we can see the implementation for HandleAlexaRequest. Here a word is returned from the database (through the WordService) and returned to the user. Because the user will be repeating that word, we want to save the word given to the user, and then cache the payload.

Each request from a user in a given session will have the same sessionId, and as a result we can use this as the cache key, and we can cache the entire request with the saved word given. The rest of the method and the protected methods deal with the return of the actual Alexa response (in both card and voice format). Note that the implementation of the ICacheService uses the .NET MemoryCache which works fine for this demo application.

// AlexaSkillProject.Services/AlexaRequestHandlerStrategies/Strategies/AbstractWordIntentHandlerStrategy.cs

public AlexaResponse HandleAlexaRequest(AlexaRequestPayload alexaRequest)
{
        // ...

        AlexaResponse alexaResponse = BuildAlexaResponse(alexaRequest, wordResponseDictionary);

        // assign word to request for caching the request between intents
        alexaRequest.Session.Attributes = new AlexaRequestPayload.SessionCustomAttributes();
        alexaRequest.Session.Attributes.LastWord = word.WordName; // use the word from the db as it is saved in intent slots
        alexaRequest.Session.Attributes.LastWordDefinition = wordResponseDictionary[WordEnum.WordDefinition];

        // cache the request and its above added session attributes
        _cacheService.CacheAlexaRequest(alexaRequest);

        return alexaResponse;
}

Now turning to the ‘SayWordIntent’, we see that the HandleAlexaRequest makes use of the slots and slot keys we defined in the earlier sections with respect to the intent schema. We take the word spoken by the user, retrieve the word we gave them on their initial request from the cache, and then compare them and say something back to the user.

// AlexaSkillProject.Services/AlexaRequestHandlerStrategies/Strategies/SayWordIntentHandlerStrategy.cs

public AlexaResponse HandleAlexaRequest(AlexaRequestPayload alexaRequest)
{
    // get the word said back to alexa
    string wordSaid = null;
    var slots = alexaRequest.Request.Intent.GetSlots();
    foreach (var slot in slots)
    {
        if (slot.Key.Equals("Word"))
        {
            wordSaid = slot.Value;
        }
    }
    // ...
}

The rest of the intents and handlers follow the pattern outlined above, and can be seen in the AlexaRequestHandlerStrategies directory.


Part V - Skill Deployment to Azure

At this point, we have set up our Alexa Developer account, created the intent schema, and built out a local web application complete with swagger for testing api calls. It is now time to deploy the web application and the database, which will be hosted on Azure.

For this application, we take advantage of the Azure integration with Visual Studio to define publish profiles and transformations. Using Azure publish, and web publish profiles, we can push the web application and run the SqlServer migrations on an azure SQL Database. When creating the WebApplication, the Web.config (and WebDebug.config and Web.Release.config) come out of the box and can be used for any transformations as needed before publishing into the Azure environment. The below resources outline the publish process in detail:

https://docs.microsoft.com/en-us/azure/app-service-web/app-service-web-get-started-dotnet
https://docs.microsoft.com/en-us/azure/app-service-web/app-service-web-tutorial-dotnet-sqldatabase


Conclusion

That wraps it up. Thanks again to the great resources made available from Amazon, Pluralsight, and AreYouFreeBusy. And be sure to download the Grammar Tool Alexa Skill and see the accompanying website.

For another great skill built with the above template, you can also visit Meeting Greeting.