Building an Asynchronous Alexa Progressive Response with C# and .NET

01/29/2018

The End Result and Motivation

There is nothing quite as powerful as casting a magical spell. Unfortunately, most (all?) of us may never experience this in our lifetime. However…. with the latest Wizard Adventures Alexa Skill, you too can have the feeling of holding your wand, pointing it at an Alexa device, casting your incantation, and getting immediate feedback that your spell hit its target.

The Wizard Adventures skill allows you to learn magical spells in the classroom setting with Alexa. After learning spells you unlock future levels and can join Alexa on interactive adventures in which you will apply your newly learned skills.

One thing I wanted to accomplish in the details of this skill was making the ‘casting’ of a spell as close to realistic as possible. This means audio feedback to ensure that the casting action is perceived to be occurring in near real time. In order for this to occur, I made use of the new Alexa Progressive Response API. With this implementation, a user can cast a spell (by saying a recognizable one word incantation to Alexa), and Alexa will respond (in close to real time) with a whooshing noise to simulate the casting of a spell (the directive) before responding with a full logic generated response to say to the user (which is generated asynchronously while the progressive response is being sent).

Before going into the details below, feel free to give it a try by saying “Alexa, open Wizard Adventures.” Eventually you will be prompted to learn a new spell or go on an adventure, at which point you will say a spell to Alexa and hear the directive audio, followed by a verbal response from Alexa.


Getting Started

An Alexa Progressive Response effectively acts as a two-part response that can be indistinguishable to the user, but also allows for more resource intensive tasks to happen behind the scenes. Amazon provides many great resources and examples of the progressive response in action. The example here shows a car ride service with complex logic for reserving a ride, but requiring the user to hear a response immediately.
https://developer.amazon.com/docs/custom-skills/send-the-user-a-progressive-response.html#when-to-send-progressive-responses

Amazon also provides a Java based example of using the progressive response here (line 546): Java Progressive Response Example

I built my latest Alexa Skill on .NET Core and as evidenced by the title of this post, needed a similar mechanism to the Java library example for implementing the functionality needed for C#.


Details

There are three parts to the details of this post:

  • Part I walks through the domain model of an AlexaRequestPayload, which is the C# representation of the JSON payload sent when Alexa gets a user request. This model has special attributes needed for making the Progressive Response possible.

  • Part II walks through building the API call (a POST request) to use the Progressive Response, integrating the domain model from part I, and building a new AlexaDirectiveRequest object which will be sent to the Progressive Response API.

  • Part III ties the AlexaDirectiveRequest and Progressive Response API call together in the context of an Alexa Skill and custom intent.


Part I - Modeling the incoming C# AlexaRequestPayload to Support an Authorization Token

The first thing needed to make a progressive response is an AlexaRequest JSON Payload. Amazon offers comprehensive documentation outlining the JSON payload that is generated when receiving an Alexa request. Modeled here is the Alexa Request Body in C#.

using Newtonsoft.Json;
using System;
using System.Collections.Generic;

namespace AlexaNetCoreDistributed.Models.AlexaModels
{
    /// <summary>
    /// This Payload is standard from Amazon from a user interacting with Alexa
    /// </summary>
    [JsonObject]
    public class AlexaRequestPayload
    {
        [JsonProperty("version")]
        public string Version { get; set; }

        [JsonProperty("session")]
        public SessionAttributes Session { get; set; }

        [JsonProperty("request")]
        public RequestAttributes Request { get; set; }

        [JsonProperty("context")]
        public ContextAttributes Context { get; set; }

        [JsonObject("context")]
        public class ContextAttributes
        {
            [JsonProperty("System")]
            public SystemAttributes System { get; set; }

            [JsonObject("System")]
            public class SystemAttributes
            {
                [JsonProperty("apiAccessToken")]
                public string ApiAccessToken { get; set; }
            }
        }

        [JsonObject("attributes")]
        public class SessionCustomAttributes
        {
            // All of the below are just examples used in my specific Alexa Skill - this section is optional
            public SessionCustomAttributes()
            {
                MemberId = default(int);
                FirstName = default(string);
                LastGivenSpellID = default(int);
                LastGivenAdventureID = default(int);
                LastRequestType = default(string);
                NumberCorrectInARow = default(int);
                MemberScore = default(int);
            }

            [JsonProperty("memberId")]
            public int? MemberId { get; set; }

            [JsonProperty("firstName")]
            public string FirstName { get; set; }

            /// <summary>
            /// The LastSpell is saved with a request payload for caching between intents from a user
            /// </summary>
            [JsonProperty("lastGivenSpellId")]
            public int? LastGivenSpellID { get; set; }

            /// <summary>
            /// The LastAdventure is saved with a request payload for caching between intents from a user
            /// </summary>
            [JsonProperty("lastGivenAdventureId")]
            public int? LastGivenAdventureID { get; set; }

            /// <summary>
            /// The LastRequestType is saved with a request payload for caching between intents from a user
            /// </summary>
            [JsonProperty("lastRequestType")]
            public string LastRequestType { get; set; }

            [JsonProperty("numberCorrectInARow")]
            public int? NumberCorrectInARow { get; set; }

            [JsonProperty("memberScore")]
            public int? MemberScore { get; set; }
        }

        [JsonObject("session")]
        public class SessionAttributes
        {
            [JsonProperty("sessionId")]
            public string SessionId { get; set; }

            [JsonProperty("application")]
            public ApplicationAttributes Application { get; set; }

            [JsonProperty("attributes")]
            public SessionCustomAttributes Attributes { get; set; }

            [JsonProperty("user")]
            public UserAttributes User { get; set; }

            [JsonProperty("new")]
            public bool New { get; set; }

            [JsonObject("application")]
            public class ApplicationAttributes
            {
                [JsonProperty("applicationId")]
                public string ApplicationId { get; set; }
            }

            [JsonObject("user")]
            public class UserAttributes
            {
                [JsonProperty("userId")]
                public string UserId { get; set; }

                [JsonProperty("accessToken")]
                public string AccessToken { get; set; }
            }
        }

        [JsonObject("request")]
        public class RequestAttributes
        {
            private string _timestampEpoch;
            private double _timestamp;

            [JsonProperty("type")]
            public string Type { get; set; }

            [JsonProperty("requestId")]
            public string RequestId { get; set; }

            [JsonIgnore]
            public string TimestampEpoch
            {
                get
                {
                    return _timestampEpoch;
                }
                set
                {
                    _timestampEpoch = value;

                    if (Double.TryParse(value, out _timestamp) && _timestamp > 0)
                        Timestamp = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc).AddMilliseconds(_timestamp);
                    else
                    {
                        var timeStamp = DateTime.MinValue;
                        if (DateTime.TryParse(_timestampEpoch, out timeStamp))
                            Timestamp = timeStamp.ToUniversalTime();
                    }
                }
            }

            [JsonProperty("timestamp")]
            public DateTime Timestamp { get; set; }

            [JsonProperty("intent")]
            public IntentAttributes Intent { get; set; }

            public RequestAttributes()
            {
                Intent = new IntentAttributes();
            }

            [JsonProperty("locale")]
            public string Locale { get; set; }

            [JsonObject("intent")]
            public class IntentAttributes
            {
                [JsonProperty("name")]
                public string Name { get; set; }

                [JsonProperty("slots")]
                public dynamic Slots { get; set; }

                public List<KeyValuePair<string, string>> GetSlots()
                {
                    var output = new List<KeyValuePair<string, string>>();
                    if (Slots == null) return output;

                    foreach (var slot in Slots.Children())
                    {
                        if (slot.First.value != null)
                            output.Add(new KeyValuePair<string, string>(slot.First.name.ToString(), slot.First.value.ToString()));
                    }

                    return output;
                }
            }
        }


    }
}

What is important to note is the ContextAttributes.SystemAttributes.ApiAccessToken. This token is unique for every incoming request, and will be used as verification by the Amazon Progressive Response API when sending a directive.

So to model the entire C# request for this we include this nested property and model the full AlexaRequestPayload like the above.


Part II - Building the Directive Service

With our new domain model and newly added apiAccessToken field, we are ready to receive an AlexaRequestPayload, extract the necessary components, build a Directive, and send it off to the Alexa Progressive Response API.

The Amazon Directive documentation shows an example of a Directive Request as JSON. Note that the request header in the Directive Request is set to the ApiAccessToken received in the original incoming payload.

To model this out in C#, we need a couple of things. First, we need a serializable domain model of our new AlexaDirectiveRequest. Second, we need a mechanism to parse out this ApiAccessToken. Third, we need to form the remainder of the content to pass into the directive. Lastly, we will build a method to send the response to the Progressive Response API.

Our new AlexaDirectiveRequest looks like the below:

[JsonObject]
public class AlexaDirectiveRequest
{
    private const string DIRECTIVE_TYPE = "VoicePlayer.Speak";

    [JsonProperty("header")]
    public HeaderAttributes Header { get; set; }

    [JsonProperty("directive")]
    public DirectiveAttributes Directive { get; set; }

    [JsonObject("directive")]
    public class DirectiveAttributes
    {
        [JsonProperty("type")]
        public string Type { get; set; }

        [JsonProperty("speech")]
        public string Speech { get; set; }
    }

    [JsonObject("header")]
    public class HeaderAttributes
    {
        [JsonProperty("requestId")]
        public string RequestId { get; set; }
    }

    public AlexaDirectiveRequest(string requestId, string speech)
    {
        Header = new HeaderAttributes();
        Header.RequestId = requestId;
        Directive = new DirectiveAttributes();
        Directive.Type = DIRECTIVE_TYPE;
        Directive.Speech = speech;
    }

}

And then for the remainder of the logic to parse the ApiAccessToken and build out this new model:

namespace AlexaNetCoreDistributed.Services.AlexaDirectives
{
    // we want to expose the method to send a response
    public interface IAlexaDirectiveService
    {
        Task<HttpResponseMessage> SendProgressiveResponseAsync(
            IAlexaRequestHandlerStrategy handlerStrategy, AlexaRequestPayload alexaRequestPayload);
    }

    public class AlexaDirectiveService : IAlexaDirectiveService
    {
        // these are constants per Amazon documentation
        private const string DIRECTIVE_URL = "https://api.amazonalexa.com/v1/directives";
        private const string BEARER = "Bearer";

        private readonly ILogger _logger;
        // optional pattern of building content for the directive request - could be a simple 'hello world' string as well
        private readonly IAlexaDirectiveContentFactory _alexaDirectiveContentFactory;

        public AlexaDirectiveService(
            ILogger<AlexaDirectiveService> logger, 
            IAlexaDirectiveContentFactory alexaDirectiveContentFactory)
        {
            _logger = logger;
            _alexaDirectiveContentFactory = alexaDirectiveContentFactory;
        }

        public Task<HttpResponseMessage> SendProgressiveResponseAsync(
            IAlexaRequestHandlerStrategy handlerStrategy, AlexaRequestPayload alexaRequestPayload)
        {
            // setup request client
            HttpClient client = new HttpClient();

            // add auth token - NOTE we have a helper method to parse the ApiAccessToken
            string apiHeaderToken = ParseHeaderAuthToken(alexaRequestPayload);
            client.DefaultRequestHeaders.Authorization =
                      new AuthenticationHeaderValue(BEARER, apiHeaderToken);

            // create content
            string directiveContent = _alexaDirectiveContentFactory.BuildDirectiveContent(handlerStrategy);

            // create the post body
            AlexaDirectiveRequest alexaDirectiveRequest = new AlexaDirectiveRequest(
                alexaRequestPayload.Request.RequestId, directiveContent);

            // async post the request
            Task<HttpResponseMessage> httpResponseTask = client.PostAsync(
                DIRECTIVE_URL, new JsonContent(alexaDirectiveRequest));

            return httpResponseTask;

        }

        public string ParseHeaderAuthToken(AlexaRequestPayload alexaRequestPayload)
        {
            try
            {
                return alexaRequestPayload.Context.System.ApiAccessToken;
            }
            catch (Exception e)
            {
                _logger.LogError(string.Format(
                    "Unable to parse auth header for directive request: {0}", e.Message));
                throw;
            }
        }


    }

    public interface IAlexaDirectiveContentFactory
    {
        string BuildDirectiveContent(IAlexaRequestHandlerStrategy handlerStrategy);
    }

    public class AlexaDirectiveContentFactory : IAlexaDirectiveContentFactory
    {
        private readonly ILogger _logger;
        private readonly IAudioService _audioService;

        public AlexaDirectiveContentFactory(
            ILogger<AlexaDirectiveContentFactory> logger,
            IAudioService audioService)
        {
            _logger = logger;
            _audioService = audioService;
        }

        // this is custom to my skill and adds an audio of a 'whooshing noise'
        // the content could be anything
        public string BuildDirectiveContent(IAlexaRequestHandlerStrategy handlerStrategy)
        {
            switch (handlerStrategy.SupportedRequestIntentName)
            {
                case "LaunchRequest":
                    return "launch request directive";
                case "SaySpellIntent":
                    string launchFile = EnumUtility.GetDescriptionFromEnumValue(AudioTypes.CastSpellAudio);
                    string audioPlay = _audioService.BuildAudioTagFromFileName(launchFile);
                    return string.Format("<speak>{0}</speak>", audioPlay);
                default:
                    _logger.LogError(
                        "This Request Handler Strategy Not implemented for Directive.");
                    throw new NotImplementedException(
                        "This Request Handler Strategy Directive Not implemented.");
            }
        }
    }

    // helper to serialize the request and send to the Amazon Progressive API
    public class JsonContent : StringContent
    {
        public JsonContent(object obj) :
            base(JsonConvert.SerializeObject(obj), Encoding.UTF8, "application/json")
        { }
    }
}

Part III - Integrating the Directive Service and Model into a full Alexa Response

Now that we have a service to build a directive and asynchronously send it to the Amazon Progressive Response API, we need to integrate it into our intent handler. In this Alexa Skill I have an intent handler strategy which maps a SaySpellIntent in the IntentSchema to this handler. For more information on this mapping of intents to handlers using the strategy pattern, please see the post HERE.

The Interface for this IntentHandler looks like the below and is setup to process an incoming AlexaRequestPayload.

public interface IAlexaRequestHandlerStrategy
{
    AlexaResponsePayload HandleAlexaRequest(AlexaRequestPayload alexaRequest);
    string SupportedRequestType { get; }
    string SupportedRequestIntentName { get; }
}

public class SaySpellIntentHandlerStrategy : IAlexaRequestHandlerStrategy
{
    // invoked when a user says a spell to Alexa.  We want to process the incoming request,
    // send a progressive response (a 'whooshing' noise), and then return the rest of the Response
}

In this intent handler strategy, we integrate the new Directive Service so that a sound effect is immediately returned to the user as an asynchronous Progressive Response, and meanwhile, the logic involved in processing the rest of the intent continues behind the scenes. In the case of this skill, that means verifying the spell said matches the last spell given to the user (using ElastiCache), updating the Alexa Member score (accessing the RDS Database), and building a response payload. The code for this implementation looks like the below:

public async Task<AlexaResponsePayload> HandleAlexaRequestAsync(AlexaRequestPayload alexaRequestPayload)
{
    // send the directive to the Progressive Response API
    Task<HttpResponseMessage> directiveResponseTask = 
        _alexaDirectiveService.SendProgressiveResponseAsync(this, alexaRequestPayload);

    AlexaResponsePayload alexaResponsePayload = default(AlexaResponsePayload);

    // ... do work and logic required to build out the above AlexaResponsePayload which will get returned

    // await the Task<HttpResponseMessage> returned above
    HttpResponseMessage directiveResponse = await directiveResponseTask;

    // we could do any sort of logic on the response codes from Amazon Progressive Response API
    // https://developer.amazon.com/docs/custom-skills/send-the-user-a-progressive-response.html#directive-response
    _logger.LogInformation(string.Format(
        "We received the http request response api token as: {0}", alexaRequestPayload.Context.System.ApiAccessToken));
    _logger.LogInformation(string.Format(
        "We received the http request response: {0}", directiveResponse.StatusCode.ToString()));

    return alexaResponsePayload;

}

In this code, we send out the request Asynchronously and assign the return of this method to a Task object. This allows the method to continue execution without blocking. At the end of the method, we await the Task object which returns an HttpResponseMessage object. If needed, we could operate or determine logic based on the return. In this case we return the AlexaResponsePayload that was built up by the processing in this request.

The last thing required to integrate the DirectiveService and asynchronous processing is to wrap the Async method in a synchronous call to ensure this class implements the synchronous interface. For the purposes of this skill, this is just a call to return the result of the asynchronous method like the below.

public AlexaResponsePayload HandleAlexaRequest(AlexaRequestPayload alexaRequestPayload)
{
    return HandleAlexaRequestAsync(alexaRequestPayload).Result;
}

Conclusion

Be sure to check out the Wizard Adventures Alexa Skill here to see the Progressive Response API in action.