Monthly Archives: April 2010

Opening a Microsoft Office documents with Silverlight

Situation

From our Silverlight application we must be able to view documents. Documents are stored in an existing document management system and are accessible through a WCF REST service. It sounds very easy, but it was quite hard to understand all the stuff going on behind the Internet Explorer and the Microsoft Office.

Simplest solution

Download the content via a WebClient instance and save it with help of the Silverlight SaveFileDialog to the users local disk. Unfortunately this simple solution provides not the expected user experience, because Silverlight is not able to set the file name for the SaveFileDialog and after saving the user must manually navigate to the local folder and open the document by hand. The user experience should like be “Click & View” and not “Click, Safe, Search, Open & View”.

Expected solution

Open a popup from the Silverlight application with the URL pointing to the document.

Via HtmlPage:

Uri docUri = new Uri("http://mydomain/docs/test.docx");
HtmlPage.PopupWindow(docUri, "windwow1", null);

Or via a HyperlinkButton:

<HyperlinkButton Content="open document"
                NavigateUri="http://mydomain/docs/test.docx"
                TargetName="_blank" />

The service part

We deliver documents through a simple WCF REST service from our middle layer infrastructure. We use a custom authentication system that relies on HTTP session cookies for client identification. When the user isn’t authenticated then the service will redirect the request to the logon page (status code 302). The REST service is a self-hosted windows service (existing middle layer). The service is available with the URI pattern: http://<myhost>/docs/<docname&gt;

As example: http://mydomain/docs/test.docx

Here the simplified version of our REST service.

public Stream GetContent(string docId)
{

    WebOperationContext context = WebOperationContext.Current;

    // read session cookie
    string cookies = context.IncomingRequest.Headers[HttpRequestHeader.Cookie];
    string sessionId = "";
    if (cookies != null)
    {
        // parse cookie values
        var cookieItems = from c in cookies.Split(';')
                            let cc = c.Split('=')
                            where cc.Length == 2
                            select new { Name = cc[0].Trim().ToLower(), Value = cc[1].Trim() };

        var items = cookieItems.ToDictionary(c => c.Name);

        if (items.ContainsKey("sessionid"))
            sessionId = items["sessionid"].Value;
    }

    // check session is valid
    bool sessionValid = false;
    sessionValid = sessionId == "1234"; // todo

    if (!sessionValid)
    {
        // redirect uri
        string redirectUri = "http://mydomain/logon.html";
        // ...set the location the logon site should naviagate to after the successful logon
        redirectUri += "?redirect=" + context.IncomingRequest.UriTemplateMatch.RequestUri.ToString();
        // redirect to the logon page 
        context.OutgoingResponse.Location = redirectUri;
        context.OutgoingResponse.StatusCode = HttpStatusCode.Redirect;
        // response has header only
        return null;
    }
    else
    {

        try
        {

            // check if document exists 
            bool docExists = true; // todo

            if (docExists)
            {
                // todo
                FileInfo requestedFile = new FileInfo("todo");

                // set mime type (firefox, chrome and safari requires them)
                context.OutgoingResponse.ContentType = GetMimeType(requestedFile.Extension);
                return new FileStream(requestedFile.FullName, FileMode.Open);
            }
            else
            {
                // doc not found
                context.OutgoingResponse.StatusCode = HttpStatusCode.NotFound;
                context.OutgoingResponse.StatusDescription = "Document not found";
                // response has header only
                return null;
            }

        }
        catch (Exception ex)
        {
            // general error
            context.OutgoingResponse.StatusCode = HttpStatusCode.InternalServerError;
            // response has header only
            return null;
        }
    }
}

At this point everything works fine with Firefox, Google Chrome and Safari on Mac.

Internet Explorer is different

Internet Explorer acts a little bit different as expected. When opening an office document (Word, Excel, PowerPoint) from a web page in Internet Explorer the Fiddler call stack lock like this:

winword:  HEAD    http://mydomain/docs/test.docx HTTP/1.1     405 NotAllowed
winword:  OPTIONS http://mydomain/docs/ HTTP/1.1              405 NotAllowed
winword:  GET     http://mydomain/docs/test.docx HTTP/1.1     302 Redirect

As you can see the iexplore process isn’t involved. What’s going on? IE detects the mime type from the file name extension, if it is preserved in the URL string. Here some examples:

 
http://mydomain/docs/test.docx IE detects mime type as docx
http://mydomain/service.svc/test.docx IE cannot detect mime type (it isn’t svc)
http://mydomain/resource.ashx?file=test.docx IE cannot detect mime type (it isn’t ash)

What happened with iexplore? Office 2007 and 2010 Beta are designed to make a more collaborative workspace. Therefore, several changes have been made to how Office works with web content. These changes provide better authoring features for the following Web servers that support Office:

  • Microsoft Windows SharePoint Services
  • Microsoft SharePoint Portal Server
  • Microsoft Exchange Web Store

IE detects if the web resource is an Office format by analyzing the URL. If so IE will start the corresponding Office program with the URL as process start parameter like the following:

“C:\Program Files\Microsoft Office\Office14\WINWORD.EXE” /n http://mydomain/docs/test.docx”

Office is now downloading the content from the web:

image

The request will end with the status code 302 and a redirect location to our logon page, because session cookies are not shared between Internet Explorer and winword process. Well, Office Word will follow this redirection and tries to display our Silverlight logon page. This isn’t possible because the SL plug-in isn’t available for the Office suite and Word shows the following content.

image 

When we change our service so that it doesn’t make a redirection and returning the status code 404 (Not Found), then word is telling me that it wasn’t able to open the doc.

image

First improvement

Changing the service to get another URL pattern for our documents so that IE cannot detect the mime type through the URL.

Something like this.

What’s happening now. Let’s look at Fiddler’s call stack.

iexplore: GET     http://mydomain/docs/test.docx HTTP/1.1     200 OK
winword:  HEAD    http://mydomain/docs/test.docx HTTP/1.1     405 NotAllowed
winword:  OPTIONS http://mydomain/docs/ HTTP/1.1              405 NotAllowed
winword:  GET     http://mydomain/docs/test.docx HTTP/1.1     302 Redirect

Now IE is downloading the content as expected. But then winword comes into the play again. What’s going on in this situation?

IE cannot detect the mime type from the URL so it will download the web content as any other browser. With the request to our service the browser is sending our session cookie to the server too. Based on the mime type from the response IE decides to open Microsoft Word. IE does that the same way as before: starting the winword process with a startup argument including the URL instead of the downloaded local file.

“C:\Program Files\Microsoft Office\Office14\WINWORD.EXE” /n http://mydomain/docs/test.docx”

Same situation, Office looks like this again:

image 

Why the Office suite is doing that? Because Office lets you edit and author documents on a Web site if the server supports Web authoring and collaboration (Sharepoint, Exchange, etc). First, Office tries to communicate with the Web server with a series of HEAD and OPTIONS requests to discover the possibilities of the webserver (“Microsoft Office Protocol Discovery”, “Microsoft Office Existence Discovery” and “Microsoft Office Core Storage Infrastructure”). Then Office tries to directly bind to the resource with a GET request to the web resource.

Second improvement

Our WCF REST service doesn’t handle the HEAD and OPTIONS request from the “Microsoft Office Protocol Discovery” and “Microsoft Office Existence Discovery”, but it redirects the GET request from the “Microsoft Office Core Storage Infrastructure” to the logon page while winword can’t deliver the session cookie.

What we can do is to detect the caller via the User-Agent header. There are three main User-Agent’s used from the Office suite.

  • Microsoft Office Protocol Discovery
  • Microsoft Office Core Storage Infrastructure
  • Microsoft Office Existence Discovery

In case of one of these 3 user-agents our service returns the status code 401 (Unauthorized) instead of the 302 (Redirect).

Then winword ignores the 401 and opens the document from the cached document which was previously downloaded from IE.

Here the code snippet to do that:

// check user agent for office product suite
bool isOfficeSuite = false;
if (!string.IsNullOrEmpty(WebOperationContext.Current.IncomingRequest.UserAgent))
{
    string[] officeUserAgents = { "Microsoft Office Protocol Discovery",
                                    "Microsoft Office Existence Discovery",
                                    "Microsoft Office Core Storage Infrastructure" };

    string requestUserAgent = WebOperationContext.Current.IncomingRequest.UserAgent.ToLower();

    var q = from userAgent in officeUserAgents
            where requestUserAgent.ToLower().Replace(" ", "")
                               .Contains(userAgent.ToLower().Replace(" ", ""))
            select userAgent;

    isOfficeSuite = q.Count() > 0;
}

if (isOfficeSuite)
{
    // don't redirect when office program want getting the document via 
    // "Microsoft Office Protocol Discovery" or 
    // "Microsoft Office Core Storage Infrastructure" requests.
    // Excel/word whould redirect to the logon page an display the html!
    WebOperationContext.Current.OutgoingResponse.StatusCode = System.Net.HttpStatusCode.Unauthorized;
}
else
{
    // redirect to the logon page
    WebOperationContext.Current.OutgoingResponse.StatusCode = System.Net.HttpStatusCode.Redirect;
    // TODO
    WebOperationContext.Current.OutgoingResponse.Location = "http://rnd.glauxsoft.ch/evidencenovaweb/";
}

// response has header only
return null;

 

Unfortunately this works well with Office 2007, but first tests with Office 2010 Beta are different again. Excel and PowerPoint 2010 doesn’t ignore the 401 status code and telling us “Could not open the document …”. Maybe this is an error in the beta.

Third improvement

However, this kind of communication doesn’t make me happy. What can we do that IE downloads the content and then simply start the process associated to the mime type as that other browsers still do?

The typical workaround I found is to use the Content-Disposition attachment header in the GET response when returning the file. This header will tell the web browser to treat the file as a download (read-only), so the file will open in Office from the web browser cache location instead of a URL. With that setting, the Office application will treat the file as local, and will therefore not make calls back to the web server.

Content-disposition is an extension to the MIME protocol that instructs a MIME user agent on how it should display an attached file. When Internet Explorer receives the header, it raises a File Download dialog box whose file name box is automatically populated with the file name that is specified in the header

In our service we set this header for all well-known Office formats, because other mime types should still be opened inline within the browser such as PDF, TXT or JPG, GIF, etc.

Here the code snippet to doing that

// well-know office formats
string[] officeMimeTypes =  {  ".doc",".dot",".docx",".dotx",".docm",".dotm",
        ".xls",".xlt",".xla",".xlsx",".xltx",".xlsm",
        ".xltm",".xlam",".xlsb",".ppt",".pot",".pps",
        ".ppa",".pptx",".potx",".ppsx",".ppam",".pptm",
        ".potm",".ppsm"};

// add content-disposition header.This header will tell the web browser 
// to treat the file as a download (read-only), so the file will open 
// in Office from the web browser cache location instead of a URL. 
if (officeMimeTypes.Contains(requestedFile.Extension.ToLower()))
{
    WebOperationContext.Current.OutgoingResponse.Headers.Add("Content-disposition",
                                            "attachment;filename=" + requestedFile.Name);
}

// set mime type (firefox, chrome and safari requires them always)
WebOperationContext.Current.OutgoingResponse.ContentType = GetMimeType(requestedFile.Extension);
return new FileStream(requestedFile.FullName, FileMode.Open);

 

Summary

Internet Explorer handles Microsoft Office formats other than expected. The main reason is to make a more collaborative workspace when working with SharePoint and Exchange. This different behavior let you run into troubles when you have your own service providing the documents.

In brief you should consider the following to get around these troubles:

  • Don’t let the IE detect the mime type from the URL
  • Don’t redirect to the logon page when the User-Agent is “Microsoft Office Core Storage Infrastructure”
  • Set the Content-disposition header for all well-known Office formats when returning the file.

Resources

http://support.microsoft.com/kb/260519/en-us

http://support.microsoft.com/default.aspx?scid=kb;EN-US;899927

http://blogs.msdn.com/vsofficedeveloper/pages/Office-Existence-Discovery-Protocol.aspx

Advertisements
Tagged , ,