Parse Raw Email (MIME) Guide

So you have some email content that looks something like this…

MIME-Version: 1.0
References: <CABxEEohuqZBoVpsyY4pOFMYixhU2bzfxgs9tRLbUoV2NJMqCJw@mail.gmail.com> <CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
In-Reply-To: <CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
From: Chris <c@sigparser.com>
Date: Wed, 9 Jan 2019 08:36:15 -0800
Message-ID: <CABxEEoizOPyCLkq4+FBGNaw7KC2TJDfTZF5dp8xD9aFjDQoL+Q@mail.gmail.com>
Subject: Re: food for thought
To: Paul <p@sigparser.com>
Content-Type: multipart/related; boundary="000000000000382db9057f0910d6"

--000000000000382db9057f0910d6
Content-Type: multipart/alternative; boundary="000000000000382db0057f0910d5"

--000000000000382db0057f0910d5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Ok.  Just a thought.  Got it.

--000000000000382db0057f0910d5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div><div dir=3D"auto">Ok.=C2=A0 Just a thought.=C2=A0 Got it. =C2=A0</div>=
</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jan 9, 2=

You want to decode it into something useful. But how? We’ll go thru how to do that in this guide.

You can see the MIME data in Gmail by opening any email, clicking the three dots on the right and clicking Show Original.

Sections

MIME Explained

MIME stands for Multipurpose Internet Mail Extensions and defines the standard format email clients use when sending and receiving emails behind the scenes. This was created before JSON and XML were popular which is why the format is so unique. You can read the specification but that can be a bit challenging. Instead we’ll go over the basics.

MIME supports features like embedded attachments, multiple email body types (plain text and HTML) in the same email, defining a content encoding type and then additional properties which can be used by new email clients. The Wikipedia page on MIME has some details.

You can see in the above example how the sections are divided, the Content-Type and Content-Transfer-Encoding fields.

You can also see near the top how In-Reply-To is used BUT you won’t see In-Reply-To defined anywhere in the MIME spec. Instead it is in defined in the Registration of Mail and MIME Header Fields spec. You can see all the various fields an email message can have on it.

Other fields like DKIM-Signature validate the sender of the message is really the sender.

In the end, you should avoid writing your own parser.

Manually Decoding The Content

We’ll show you how to capture the HTML and plain text bodies from MIME format and convert them to a usable form without any code.

If you’re in Gmail for example and click the “…” for an email and click “Show Original” you can see the MIME data.

HTML Section

Find the section header with content-type is text/html.

--9f823aebd27c8d7e34c2ad1ba241f25b1140e7d3745bb216e425308b6182
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8
Mime-Version: 1.0

Everything below that is the HTML. In this case it is encoded as quoted-printable which means we need to decode it or it won’t render correctly. To decode, use a tool like Webatic which lets you copy and paste the quoted-printable text into the Encoded box and then click Decode.

The result will be a usable set of HTML.

Plain Text Section

Not all emails will have HTML or sometimes you’ll get a plain text version as well which is an approximation of the HTML content.

--9f823aebd27c8d7e34c2ad1ba241f25b1140e7d3745bb216e425308b6182
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Mime-Version: 1.0

Again this is encoded as quoted-printable which means we need to decode it or it won’t render correctly. To decode, use a tool like Webatic which lets you copy and paste the quoted-printable text into the Encoded box and then click Decode.

Content-Transfer-Encoding: base64

What if the Content-Transfer-Encoding is base64 for either text/plain or text/html? In that case you have an extra step to do.

It will look like this.

--_000_BYAPR13MB2294DD720555473A1F0CF57BD23E0BYAPR13MB2294namp_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

VGhhbmsgeW91LCBQYXVsLg0KDQpGcm9tOiBQYXVsIE1lbmRvemEgPHBtZW5kb3phQHNpZ3BhcnNl
ci5jb20+DQpTZW50OiBGcmlkYXksIEFwcmlsIDI2LCAyMDE5IDY6NDIgQU0NClRvOiBEYW1vbiBT
  1. Copy all the base64 encoded text. It is the non-sensical text below “Content-Transfer-Encoding” in this case.
  2. Paste it into base64decode textbox and click Decode
  3. The result textbox should contain the email.

MIME Parsing Tools

Online Services

Code Frameworks

Code Examples

PHP Parse Email

Example from the GitHub page for php-mime-mail-parser

<?php
// Include the library first
require_once __DIR__.'/vendor/autoload.php';

$path = 'path/to/mail.txt';
$Parser = new PhpMimeMailParser\Parser();

// There are four methods available to indicate which mime mail to parse.
// You only need to use one of the following four:

// 1. Specify a file path to the mime mail.
$Parser->setPath($path); 

// 2. Specify a php file resource (stream) to the mime mail.
$Parser->setStream(fopen($path, "r"));

// 3. Specify the raw mime mail text.
$Parser->setText(file_get_contents($path));

// 4.  Specify a stream to work with mail server
$Parser->setStream(fopen("php://stdin", "r"));

// Once we've indicated where to find the mail, we can parse out the data
$to = $Parser->getHeader('to');             // "test" <test@example.com>, "test2" <test2@example.com>
$addressesTo = $Parser->getAddresses('to'); //Return an array : [["display"=>"test", "address"=>"test@example.com", false],["display"=>"test2", "address"=>"test2@example.com", false]]

$from = $Parser->getHeader('from');             // "test" <test@example.com>
$addressesFrom = $Parser->getAddresses('from'); //Return an array : [["display"=>"test", "address"=>"test@example.com", "is_group"=>false]]

$subject = $Parser->getHeader('subject');

$text = $Parser->getMessageBody('text');

$html = $Parser->getMessageBody('html');
....

Python Parse Email

import email

msg = email.message_from_string(emailtext)
msg['from']  # 'example@example.com'
msg['to']    # 'example2@something.com'

JavaScript Parse Email

Example from emailjs-mime-parser

npm install --save emailjs-mime-parser
import parse from 'emailjs-mime-parser'

parse(String) -> MimeNode

C#/.NET MIME Parser

Example from MimeKit

var parser = new MimeParser (stream, MimeFormat.Entity);
var message = parser.ParseMessage ();

Get Your API Key

Try the SigParser API. Signup and get an API key with 1,500 free emails per month. Upgrade or downgrade at any time. Our API is entirely serverless and stateless.
Get Your API Key Now!