Herbi Web Keys background and technical details

A few months ago I was introduced to a gentleman who wanted to work with e-mail and web browsing on his Windows 7 PC. He’d done these things some time ago, but since then some features of the software he’d used had changed, and he’d benefit from some help in learning the new functionality. In addition to getting familiar with his versions of Windows, Windows Live Mail and Internet Explorer (IE), being blind, he’d also need to become familiar with the current version of his screen reader, Window-Eyes. So over the weeks that followed, we ran through the steps for reading and composing e-mail and browsing the web.

 

During this period, I was also conscious of how someone with MND/ALS such as this gentleman, might find it a challenge to press all the keys required in order to browse the web. So this got me thinking about whether there’s anything I could build which might be useful in this situation. In Particular, is there a way to perform specific actions by pressing only a single key, and minimizing the amount of hand movement? Perhaps one approach is to see what can be done through use of the Number Pad keys alone.

 

The screen reader used has a very useful feature whereby the default key combinations for certain actions can be replaced with other combinations, (or a single key press), which might be preferable to the user. So we could have changed the trigger for a certain set of actions to be NumPad keys. But in this case, we were interested in having a key-press perform custom action that went beyond controlling the screen reader. For example, say we have a key that should invoke the IE Favorites list. It’s possible that when that key’s pressed, IE is not in the foreground, or not even running at all. So in reaction to that key, I want to start IE if it’s not running, bring it into the foreground, and then invoke the Favorites list.

 

With this in mind, I set out of build a simple tool that would allow web browsing with the Window-Eyes screen reader, only using single key presses on the NumPad. (If this seemed to have potential, I could enhance this to allow reading of e-mails too.) This is what I ended up with:

 

The Herbi Web Keys program

 

 

The app itself is a regular WinForms app, and having got some UI in place, the next thing I did was add a low-level keyboard hook. (Clicking the buttons in the app doesn’t actually do anything, because we’re not interested in that input mode.) If I detected a key press from a NumPad key, I’d post a message to my main UI, and eat the key press. I set one key to effectively turn the app on or off, in case it’d be useful to temporarily render the app inert while it’s running. The app has a fair bit of interop with Win32 API, so http://www.pinvoke.net was very helpful to me as I built the app.

 

By the way, here are a couple of quick notes on keyboard simulation. When I wanted to simulate an ‘l’ key press, I set the virtual key code of the key I wanted simulating to be ‘l’. This was a bad idea, because the ascii ‘l’ character matches the virtual key code of VK_SEPARATOR, and so that’s what I was actually simulating. Instead I should use the ascii value for ‘L’. I also wanted to run the spyxx.exe tool to see what key code were being generated by some keys. That tool won’t work unless everything of interest has the same bitness, (that is all 32-bit or all 64-bit). On my system it so happened that I couldn’t see the key code I was interested in unless I pointed my spyxx.exe to the Notepad run from c:\windows\syswow64.

 

I also used one key to allow a description of the key to be spoken rather than acting on the key press. This would help the user get familiar with the layout of the keys. The app uses the very useful System.Speech.Synthesis to output speech itself when it needs to.

 

When the main UI receives the message describing what key’s been pressed, it first takes some preparatory action like making sure the IE window is in the foreground. (Having said that, I’m being rather relaxed about “making sure” here. I have a couple of Thread.Sleep() in the app, where I assume that if I trigger some action, that action will really happen before long. I might update this at some point, to add a little verification and avoid the assumptions.)

 

The bulk of what the app does next is to simulate key presses with the SendInput(). For example, I can control IE by simulating Alt+C to show the Favorites list. And I can control Window-Eyes by simulating ‘l’, ‘h’ and ‘p’ to move to the next link, header or paragraph on the web page.

 

But in addition to all this, I did add some use of UI Automation (UIA) where I wanted to interact directly with UI elements shown on the screen, rather than relying on simulated keyboard input. In order to do this I first went to my C# sample out at http://code.msdn.microsoft.com/Windows-7-UI-Automation-0625f55e/sourcecode?fileId=21468&pathId=534397530. That showed me the pre-build event command line of:

 

    "%PROGRAMFILES%\Microsoft SDKs\Windows\v7.0A\bin\tlbimp.exe" %windir%\system32\UIAutomationCore.dll /out:..\interop.UIAutomationCore.dll"

 

This generates a managed wrapper around the Windows UIA API which I can then reference as “interop.UIAutomationCore” in my project’s list of references. Then by adding the following in my main app source file:

 

    using interop.UIAutomationCore;

    …

    private IUIAutomation m_uiautomation;

    …

    m_uiautomation = new CUIAutomation();

 

 

I have my UIA object and I’m good to go.

 

I used UIA in the app for two things. The first is to detect whether the Favorites list is visible. The app has a key to toggle the display of the Favorites list, so I need to know whether the list is already visible in order to know what action to take. (It’s always possible there some keyboard shortcut which will always toggle the display of the list, but I don’t know it if there is.)

 

I needed to find a way to determine if the Favorites list is visible or not. So I pointed the Inspect SDK tool to the Favorites list. The image below shows the results shown.

 

 

The Inspect SDK tool showing the UIA properties of the Internet Explorer Favorites list UI.

 

 

 

I found that when the Favorites list is visible, a UIA Tree control appears in the UIA tree and it has a name of “Favorites”. That element does not exist in the UIA tree when the Favorites list is not visible. So in order to determine whether the Favorites list is visible, all I need to do is try to find that element.

 

The code below shows how I did that. A very important aspect of this is that I do not look for an element whose name is “Favorites”. The element’s name is probably localized for worldwide use, and I’ll only find “Favorites” on US-English system. Instead, if the element has an AutomationId I should always base my search on that rather than on the name. The AutomationId does not get localized, and Inspect shows me the AutomationId of the element I’m interested in is “100”. I’ve found that element has the same id in IE7, IE9 and IE10, so I expect it’s had this id for a long time, and my code will be robust regardless of what version of IE is being used.

 

 

// UIA-related values taken from "C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include\UIAutomationClient.h".

// Defining them in the app rather than pulling the values from interop.UIAutomationCore made some build step a

// little simpler when I first did this, (but I don't remember the details of that.)

private int c_propertyIdControlType = 30003;

private int c_propertyIdName = 30005;

private int c_propertyIdAutomationId = 30011;

private int c_propertyIdTreeType = 50023;

 

// Detect whether the IE Favorites list is visible.

private bool IsFavoritesListVisible()

{

    bool fShowingFavorites = false;

 

    // Find the "IEFrame" window. We've already taken action to try to make sure IE is running.

    IntPtr hwnd = Win32Interop.FindWindow(c_strBrowserWindowClass, null);

    if (hwnd != IntPtr.Zero)

    {

        // Get the UIA element that represents the IE window.

        IUIAutomationElement elementBrowser = m_uiautomation.ElementFromHandle(hwnd);

        if (elementBrowser != null)

        {

            // Find an element whose control type is UIA_TreeControlTypeId.

            IUIAutomationCondition conditionControlType = m_uiautomation.CreatePropertyCondition(c_propertyIdControlType, c_propertyIdTreeType);

 

            // Don't look for an element name "Favorites", as I expect that won't work on anything but US-English systems.

            // IUIAutomationCondition conditionName = m_uiautomation.CreatePropertyCondition(c_propertyIdName, "Favorites");

 

            // Find an element whose AutomationID is "100".

            IUIAutomationCondition conditionName = m_uiautomation.CreatePropertyCondition(c_propertyIdAutomationId, "100");

 

            // Combine the control type condition and name condition into a single condition.

            IUIAutomationCondition condition = m_uiautomation.CreateAndCondition(conditionControlType, conditionName);

 

            // No cached properties or patterns are going to be accessed after we've tried to find the Favorites list.

            IUIAutomationCacheRequest cacheRequest = m_uiautomation.CreateCacheRequest();

 

            // Now find the first element beneath the browser element that meets the condition.

            IUIAutomationElement elementButton = elementBrowser.FindFirstBuildCache(TreeScope.TreeScope_Descendants, condition, cacheRequest);

            if (elementButton != null)

            {

                // We've found the Favorites list.

                fShowingFavorites = true;

            }

        }

    }

 

    return fShowingFavorites;

}

 

 

The other way I originally leveraged UIA in the app is to invoke a button in the UI. As it happens, I did this in such a way that I broke the rules I’ve just mentioned on not basing searches for elements on fixed US-English strings. When I first started on the app, I wrote some quick code to have a key in the app invoke IE’s Back button. As Inspect reported, (as shown in the image below), the button doesn’t have an AutomationId, so I found the button from its name. This was fine for my needs at the time, but it meant the app can’t be leveraged outside English-speaking countries, and that’s not sufficient for me. I don’t want the limitations of my app to be the reason why it can’t be used anywhere in the world.

 

The Inspect SDK tool showing the UIA properties of the Internet Explorer Back button.

 

 

 

 

So I replaced my original code with a simulated keyboard shortcut of Backspace, which triggers a move to the previous page in IE. That avoided the bad practice of searching for the English accessible name of a button. For anyone interested, this is what the original code for invoking a button looked like.

 

int c_patternIdInvoke = 10000;

 

private void InvokeButton(string buttonName)

{

    // Find the "IEFrame" window. We've already taken action to try to make sure IE is running.

    IntPtr hwnd = Win32Interop.FindWindow(c_strBrowserWindowClass, null);

    if (hwnd != IntPtr.Zero)

    {

        // Get the UIA element that represents the IE window.

        IUIAutomationElement elementBrowser = m_uiautomation.ElementFromHandle(hwnd);

        if (elementBrowser != null)

        {

            // Create a cache request to get the Invoke pattern for the element. This means

            // we don't incur a cross-proc call to get the pattern later.

            IUIAutomationCacheRequest cacheRequest = m_uiautomation.CreateCacheRequest();

            cacheRequest.AddPattern(c_patternIdInvoke);

 

            // Search for a button whose name has been supplied to us. We could add other

            // conditions here if we want to. For example, only look for elements with an

            // IsEnabled property of true. We're not interested in getting a Back button

            // that's not enabled. Or create a condition which means we're only interested

            // in elements in the Control View of the UIA tree. That would mean we'll avoid

            // searching through element which only appear in the Raw View of the tree.

            // Being careful about what conditions are set up before a call to find an

            // element can be a great way of optimizing performance.

 

            IUIAutomationCondition conditionControlType = m_uiautomation.CreatePropertyCondition(c_propertyIdControlType, c_propertyIdButtonType);

            IUIAutomationCondition conditionName = m_uiautomation.CreatePropertyCondition(c_propertyIdName, buttonName);

 

            // Combine the control type condition and name condition into a single condition.

            IUIAutomationCondition condition = m_uiautomation.CreateAndCondition(conditionControlType, conditionName);

 

            IUIAutomationElement elementButton = elementBrowser.FindFirstBuildCache(TreeScope.TreeScope_Descendants, condition, cacheRequest);

            if (elementButton != null)

            {

                // Get the Invoke pattern which we requested to be cached when the element was found.

                IUIAutomationInvokePattern pattern = (IUIAutomationInvokePattern)elementButton.GetCachedPattern(c_patternIdInvoke);

 

                // Now invoke the button. This will incur a cross-proc call.

                pattern.Invoke();

            }

        }

    }

}

 

I could then call that code with:

 

    InvokeButton("Back");

 

 

So there we have it - a simple app which through a mix of keyboard simulation and UIA calls can provide a means to control other features in order to browse the web with a screen reader, through single keys presses which are in in close proximity to each other.

 

 

The Visual Studio 2010 project for the app can be found here.

 

 

Home