Gmail API Response Parsing Guide

The Gmail API is a tough API to consume.

At SigParser, our clients often need to extract the email bodies from the Google API response before sending them to our APIs for extracting signatures or splitting the bodies.

In most cases you only need the subject, body, date, recipients and attachments. Google doesn’t make getting this information easy from their API. The main problem is they are just representing the MIME email format as JSON for the most part. This means, you basically need to parse the email using MIME handlers.

Example Gmail Response

The following is an example of a Gmail email API response with a buch of headers removed.

{
  "id": "169eefc8138e68ca",
  "threadId": "169eefc8138e68ca",
  "labelIds": [
    "UNREAD",
    "IMPORTANT",
    "CATEGORY_PERSONAL",
    "INBOX"
  ],
  "snippet": "John Smith CEO of BigCo Cell - 619-344-3322 Office - 619-345-2333 San Diego, CA Need to reach me? Use this link to book a meeting.",
  "historyId": "270427",
  "internalDate": "1554492714000",
  "payload": {
    "partId": "",
    "mimeType": "text/plain",
    "filename": "",
    "headers": [
      {
        "name": "Delivered-To",
        "value": "test@dragnettech.com"
      },
      {
        "name": "Return-Path",
        "value": "\u003coutlook.tester@salesforceemail.com\u003e"
      },
      {
        "name": "From",
        "value": "Outlook Tester \u003coutlook.tester@salesforceemail.com\u003e"
      },
      {
        "name": "To",
        "value": "\"test@dragnettech.com\" \u003ctest@dragnettech.com\u003e"
      },
      {
        "name": "Subject",
        "value": "Plain text sample email"
      },
      {
        "name": "Thread-Topic",
        "value": "Plain text sample email"
      },
      {
        "name": "Thread-Index",
        "value": "AdTr5jkL493BeKJkSt2Icrw+4R5TWw=="
      },
      {
        "name": "Date",
        "value": "Fri, 5 Apr 2019 19:31:54 +0000"
      },
      {
        "name": "Message-ID",
        "value": "\u003cBYAPR04MB4726F849EA8F81776BF95A9D8A510@BYAPR04MB4726.namprd04.prod.outlook.com\u003e"
      },
      {
        "name": "Accept-Language",
        "value": "en-US"
      },
      {
        "name": "Content-Language",
        "value": "en-US"
      },
      {
        "name": "authentication-results",
        "value": "spf=none (sender IP is ) smtp.mailfrom=outlook.tester@salesforceemail.com;"
      },
      {
        "name": "Content-Type",
        "value": "text/plain; charset=\"us-ascii\""
      },
      {
        "name": "Content-Transfer-Encoding",
        "value": "quoted-printable"
      }
    ],
    "body": {
      "size": 146,
      "data": "DQoNCkpvaG4gU21pdGgNCkNFTyBvZiBCaWdDbw0KQ2VsbCAtIDYxOS0zNDQtMzMyMg0KT2ZmaWNlIC0gNjE5LTM0NS0yMzMzDQpTYW4gRGllZ28sIENBDQoNCk5lZWQgdG8gcmVhY2ggbWU_IFVzZSB0aGlzIGxpbmsgdG8gYm9vayBhIG1lZXRpbmcuIA0KDQo="
    }
  },
  "sizeEstimate": 6978
}


Parsing the Fields

Here are the important fields:

  • Subject - Payload header “Subject”
  • Email Date - “internaldate”
  • Folders
    • Gmail doesn’t have a folder concept in the same way as other email clients.
    • Use labelIds to determine which folders the email is in.
  • From - Payload header “From” by decode using recipient parsing
  • To, CC, BCC - Payload header but decode using recipient parsing
  • Bodies - Find using body parsing section below.

Recipient Parsing Code

The recipient headers often look like this

Test User \u003ctest@sigparser.com\u003e, Chris Landry \u003cperson@sigparser.com\u003e

You need to parse this with a MIME parser.

Warning: You could try writing your own regex to capture them but we didn’t find examples like this to work in all cases.

For C#, we suggest using MimeKit.

Install-Package MimeKit -Version 2.1.5.1

Then make a function like this to parse the field value for From, To, CC and BCC using MimeKit.

public static MailAddress[] ConvertGmailHeaderFieldToPeople(string field)
{
    if (String.IsNullOrWhiteSpace(field))
        return new MailAddress[0];

    MimeKit.InternetAddressList addresses;
    if (MimeKit.InternetAddressList.TryParse(field, out addresses))
    {
        return addresses.Select(s =>
            {
                if (s is MimeKit.MailboxAddress)
                {
                    var i = s as MimeKit.MailboxAddress;
                    return new MailAddress { Address = i.Address, DisplayName = i.Name };
                }
                else if (s is MimeKit.GroupAddress)
                {
                    var i = s as MimeKit.GroupAddress;
                    return null;
                }
                else
                {
                    throw new NotImplementedException("Could not find SigParser code handler for address type " + s.GetType().FullName);
                }
            })
            .Where(a=>a != null)
            .ToArray();
    }
    else
    {
        return new MailAddress[] { };
    }
}

Extracting Body Contents And Attachments

The MIME structure the Gmail API returns is complex to handle. The structure you see above only applies to a plain text email. Here are all the different permutations of JSON responses we’ve tested this solution with.

  • Plain text only
  • Plain text with HTML
  • Plain text with HTML and Attachments
  • Plain text with Attachments
  • HTML with Attachments
  • ??? There are probably others

The only way we’ve found to handle it is with recursion as you’ll see below.

This example assumes you’re using the Nuget package for the Gmail API and you’ve got object representing an email.

message is a simplified model we use to store the email body and a collection of attachments.

email is the Gmail API object representing the email response JSON.

You’ll call ExtractMessagePart like this.

Email.EmailMessageModel message = new Email.EmailMessageModel();
ExtractMessagePart(email.Payload, ref message);

part is the payload or nested payloads of a message from the Gmail API.

ExtractMessagePart is a recursive method.

public static void ExtractMessagePart(MessagePart part, ref EmailMessageModel message)
{
    if (part == null)
        return;

    var contentDisposition = part.Headers?.FirstOrDefault(h => h.Name == "Content-Disposition");
    if (contentDisposition != null && (contentDisposition.Value.StartsWith("attachment") || contentDisposition.Value == "inline"))
    {
        message.Attachments.Add(new DragnetTech.EventProcessors.Email.EmailMessageModel.Attachment
        {
            AttachmentId = part.Body.AttachmentId,
            Filename = part.Filename,
            ContentID = contentDisposition.Value.StartsWith("inline") || part.Headers?.FirstOrDefault(h => h.Name == "Content-ID") != null ? Utils.UnescapeUnicodeCharacters(part.Headers.FirstOrDefault(h => h.Name == "Content-ID")?.Value) : null,
            Size = part.Body.Size ?? 0,
            ExchangeID = part.Body.AttachmentId,
            Data = part.Body.Data,
            ContentType = part.Headers?.FirstOrDefault(h => h.Name == "Content-Type")?.Value
        });
    }
    else
    {
        if (part.MimeType == "text/plain")
        {
            message.Body = DecodeSection(part.Headers?.FirstOrDefault(h => h.Name == "Content-Transfer-Encoding")?.Value, part.Body?.Data);
            message.IsHtml = false;
        }
        else if (part.MimeType == "text/html")
        {
            message.Body = DecodeSection(part.Headers?.FirstOrDefault(h => h.Name == "Content-Transfer-Encoding")?.Value, part.Body?.Data);
            message.IsHtml = true;
        }
    }


    if (part.Parts != null)
    {
        foreach (var np in part.Parts)
        {
            ExtractMessagePart(np, ref message);
        }
    }
}

public static string DecodeSection(string contentTransferEncoding, string base64Content)
{
    if (base64Content == null)
    {
        return "";
    }

    byte[] decoded = DragnetTech.EventProcessors.Google.GoogleProcessor.FromBase64ForUrlString(base64Content);
    string decodedString = Encoding.UTF8.GetString(decoded);

    return decodedString;
}


public static byte[] FromBase64ForUrlString(string base64ForUrlInput)
{
    //https://stackoverflow.com/questions/24464866/having-trouble-reading-the-text-html-message-part?rq=1
    int padChars = (base64ForUrlInput.Length % 4) == 0 ? 0 : (4 - (base64ForUrlInput.Length % 4));
    StringBuilder result = new StringBuilder(base64ForUrlInput, base64ForUrlInput.Length + padChars);
    result.Append(String.Empty.PadRight(padChars, '='));
    result.Replace('-', '+');
    result.Replace('_', '/');
    return Convert.FromBase64String(result.ToString());
}

Next Steps

If you want to split the email bodies into parts, remove email signatures from email bodies or capture contact details from the bodies of emails, you should try out SigParser.

Get Your API Key

Try the SigParser API. Signup and get an API key with 1,500 free emails per month. Upgrade or downgrade at any time. Our API is entirely serverless and stateless.
Get Your API Key Now!