Extract email bodies, remove reply chains and signatures

Provide EML files or MIME encoded emails and get structured JSON back.
Test Our API for Free
No Commitment Required


At SigParser we’re in the business of capturing data from unstructured emails. When we started building SigParser we tried all the open source solutions for parsing emails. None of them had the accurate enough. So we built the most accurate email body parser in existance.

Say you have an email body like this…

Great talking with you. Let's catchup soon.

Thanks,
Mark Anderson
VP of Engineering
888-222-4444

On Fri, Nov 19, 2018 at 12:03 PM, Paul Johnson <paul@example.com> wrote:

> Let's talk at 11.
> Thanks
> Paul Johnson

And you want the first message only…

Great talking with you. Let's catchup soon.

Or maybe the second message body…

Let's talk at 11.

How do you do that easily? We’ll cover various programming solutions below.

Why is this hard?

We spent years building email parsers. There are a lot of issues that need to be solved when writing your own email parser:

  • Signature identification
  • Various formats for headers
    • On Fri, Nov 19th…
    • On 10/9/2018
    • Headers that wrap across lines
    • From:, To:, Date: style headers
  • Reply chains indicated by > or multiple >>>
  • Some lines look like signatures but aren’t
  • Corrupted email headers
  • Common for plain text emails to split reply headers
  • Multi-language support is required even if no one speaks another language on your team
  • Header formats change over time
  • Email clients change over time

Still don’t believe us? Look at our change logs. We’re constantly finding new edge cases.

Due to this, we suggest not coding your own signature parsing algorithm. It is non-trivial. There are also a number of open source half baked efforts out there as well. We’ve tried them all. Most of our users have tried those first before using SigParser.

SigParser’s Cross Platform Email Parsing Tools

Our simple email parsing tools provide a consistent JSON result.

  • Clean email bodies of signatures and reply chains
  • Get email bodies for forwarded emails
  • Capture nested email chains in a single MIME message or .eml file
  • REST API option - POST https://api.sigparser.com/api/Mime/ParseString
  • Windows, Linux and AWS Lambda deployment options
    • .eml, .msg, or JSON format inputs
  • Frequent updates as email clients and patterns change
  • Usage based and unlimited plans available

The output structure will look like this.

{
  "CleanedBodyPlain": "Another response in the chain.\r\n\r\n",
  "CleanedBodyHtml": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"></div></div></div>",
  "IsSpammyLookingEmailMessage": false,
  "IsSpammyLookingSender": false,
  "EmailTypes": [
    "NormalEmail"
  ],
  "Emails": [
    {
      "CleanedBodyPlain": "Another response in the chain.\r\n\r\n",
      "CleanedBodyHtml": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"></div></div></div>",
      "Subject": null,
      "Date": "2020-05-11T16:41:16+00:00",
      "FromEmailAddress": "paul@example.com",
      "FromName": "Paul Mendoza",
      "To": [
        {
          "Name": "Outlook Tester",
          "EmailAddress": "outlook.tester@salesforceemail.com"
        }
      ],
      "Cc": []
    },
    {
      "CleanedBodyPlain": "This is a reply from the test account.\r\n\r\n",
      "CleanedBodyHtml": null,
      "Subject": null,
      "Date": "2020-05-11T09:40:00",
      "FromEmailAddress": "outlook.tester@salesforceemail.com",
      "FromName": "Outlook Tester",
      "To": [],
      "Cc": []
    },
    {
      "CleanedBodyPlain": null,
      "CleanedBodyHtml": null,
      "Subject": "One more test email at 3:25 PM",
      "Date": "2020-04-12T15:25:00",
      "FromEmailAddress": "paul@example.com",
      "FromName": "Paul Mendoza",
      "To": [
        {
          "Name": "Outlook Tester",
          "EmailAddress": "outlook.tester@salesforceemail.com"
        }
      ],
      "Cc": []
    }
  ],
  "Subject": "Re: One more test email at 3:25 PM",
  "Date": "2020-05-11T16:41:16+00:00",
  "Headers": {
    "mime-version": "1.0",
    "date": "Mon, 11 May 2020 09:41:16 -0700",
    "references": "<CAL5Lp9VcCVNqeiw0Rry7BHQaTct46qv3BnUvR5-HNqWZO-Xxiw@mail.gmail.com>\r\n\t<BY5PR04MB6819EFA89CDABDFCB9D67D2F8AA10@BY5PR04MB6819.namprd04.prod.outlook.com>",
    "in-reply-to": "<BY5PR04MB6819EFA89CDABDFCB9D67D2F8AA10@BY5PR04MB6819.namprd04.prod.outlook.com>",
    "message-id": "<CAL5Lp9X0RjYNOo68Y_boL8OOw32gU-SWxLW3WjgYj93eTfUsyQ@mail.gmail.com>",
    "subject": "Re: One more test email at 3:25 PM",
    "from": "Paul Mendoza <paul@example.com>",
    "to": "Outlook Tester <outlook.tester@salesforceemail.com>",
    "content-type": "multipart/alternative; boundary=\"00000000000001bd4705a5620460\""
  },
  "FullPlainTextBody": "Another response in the chain.\n\n*Paul Mendoza*, Founder\nMobile 760-917-3753\nSigParser\npaul@example.com\nSchedule a meeting with me here <https://www.meetingbird.com/m/xxxxxx>\n\nListen to podcasts? I was recently on the *FutureTech Podcast*\n<https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/>\ntalking about SigParser and use cases other customers are using it for.\n\n\nOn Mon, May 11, 2020 at 9:40 AM Outlook Tester <\noutlook.tester@salesforceemail.com> wrote:\n\n> This is a reply from the test account.\n>\n>\n>\n> *From:* Paul Mendoza <paul@example.com>\n> *Sent:* Sunday, April 12, 2020 3:25 PM\n> *To:* Outlook Tester <outlook.tester@salesforceemail.com>\n> *Subject:* One more test email at 3:25 PM\n>\n>\n>\n>\n> *Paul Mendoza, *Founder\n>\n> Mobile 760-917-3753\n>\n> SigParser\n>\n> paul@example.com\n>\n> Schedule a meeting with me here <https://www.meetingbird.com/m/xxxxxx>\n>\n> Listen to podcasts? I was recently on the *FutureTech Podcast*\n> <https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/>\n> talking about SigParser and use cases other customers are using it for.\n>\n",
  "FullHtmlBody": "<div dir=\"ltr\"><div dir=\"ltr\"><div>Another response in the chain. </div><div><br clear=\"all\"><div><div dir=\"ltr\" class=\"gmail_signature\" data-smartmail=\"gmail_signature\"><div dir=\"ltr\"><div><div dir=\"ltr\"><div><div dir=\"ltr\"><div><div dir=\"ltr\"><div dir=\"ltr\"><div dir=\"ltr\"><div dir=\"ltr\"><font color=\"#3d85c6\" face=\"tahoma, sans-serif\" style=\"font-size:12.8px\"><b>Paul Mendoza</b></font><font color=\"#3d85c6\" face=\"tahoma, sans-serif\" style=\"font-size:12.8px;font-weight:bold\">, </font><span style=\"font-size:12.8px;color:rgb(61,133,198);font-family:tahoma,sans-serif\">Founder</span><div style=\"font-size:12.8px\"><div><font color=\"#666666\" size=\"2\" face=\"arial narrow, sans-serif\">Mobile 760-917-3753</font></div><div><font color=\"#666666\" size=\"2\" face=\"arial narrow, sans-serif\">SigParser</font></div><div><a href=\"mailto:paul@example.com\" style=\"font-family:tahoma,sans-serif;font-size:12.8px;color:rgb(17,85,204)\" target=\"_blank\">paul@example.com</a><br></div><div><a href=\"https://www.meetingbird.com/m/xxxxxx\" target=\"_blank\">Schedule a meeting with me here</a></div><div><img src=\"https://drive.google.com/a/sigparser.com/uc?id=1GUhMvrGnJMCfkge1HMqyKFQCLSJNXcw-&amp;export=download\" width=\"200\" height=\"90\"><br></div></div>Listen to podcasts? I was recently on the <a href=\"https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/\" target=\"_blank\"><b>FutureTech Podcast</b></a> talking about SigParser and use cases other customers are using it for. </div></div></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class=\"gmail_quote\"><div dir=\"ltr\" class=\"gmail_attr\">On Mon, May 11, 2020 at 9:40 AM Outlook Tester &lt;<a href=\"mailto:outlook.tester@salesforceemail.com\">outlook.tester@salesforceemail.com</a>&gt; wrote:<br></div><blockquote class=\"gmail_quote\" style=\"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex\">\n\n\n\n\n\n<div lang=\"EN-US\">\n<div class=\"gmail-m_-2662285044572695259WordSection1\">\n<p class=\"MsoNormal\">This is a reply from the test account.<u></u><u></u></p>\n<p class=\"MsoNormal\"><u></u> <u></u></p>\n<div style=\"border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in\">\n<p class=\"MsoNormal\"><b>From:</b> Paul Mendoza &lt;<a href=\"mailto:paul@example.com\" target=\"_blank\">paul@example.com</a>&gt; <br>\n<b>Sent:</b> Sunday, April 12, 2020 3:25 PM<br>\n<b>To:</b> Outlook Tester &lt;<a href=\"mailto:outlook.tester@salesforceemail.com\" target=\"_blank\">outlook.tester@salesforceemail.com</a>&gt;<br>\n<b>Subject:</b> One more test email at 3:25 PM<u></u><u></u></p>\n</div>\n<p class=\"MsoNormal\"><u></u> <u></u></p>\n<div>\n<p class=\"MsoNormal\"><br clear=\"all\">\n<u></u><u></u></p>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<p class=\"MsoNormal\"><b><span style=\"font-size:9.5pt;font-family:Tahoma,sans-serif;color:rgb(61,133,198)\">Paul Mendoza, </span></b><span style=\"font-size:9.5pt;font-family:Tahoma,sans-serif;color:rgb(61,133,198)\">Founder</span><u></u><u></u></p>\n<div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:10pt;font-family:&quot;Arial Narrow&quot;,sans-serif;color:rgb(102,102,102)\">Mobile 760-917-3753</span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:10pt;font-family:&quot;Arial Narrow&quot;,sans-serif;color:rgb(102,102,102)\">SigParser</span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><a href=\"mailto:paul@example.com\" target=\"_blank\"><span style=\"font-family:Tahoma,sans-serif;color:rgb(17,85,204)\">paul@example.com</span></a><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><a href=\"https://www.meetingbird.com/m/xxxxxx\" target=\"_blank\">Schedule a meeting with me here</a><u></u><u></u></span></p>\n</div>\n<div>\n<p class=\"MsoNormal\"><span style=\"font-size:9.5pt\"><img border=\"0\" width=\"200\" height=\"90\" style=\"width: 2.0833in; height: 0.9375in;\" id=\"gmail-m_-2662285044572695259_x0000_i1025\" src=\"https://ci6.googleusercontent.com/proxy/TTpjUlFcjmphqTPKcbTFGb7TsHUk5MzP3P1Wt2uZYLjMzlO0UPeF7MAgaUwFk4hqlFafCMhmzlmkc3FUbGH4ijNXkqx9DAsv-_3CFnCTmZaZhMlONJqrrR-oGfWMfwqGpDgk301HHsijRMhsymfOCkhNKg=s0-d-e1-ft#https://drive.google.com/a/sigparser.com/uc?id=1GUhMvrGnJMCfkge1HMqyKFQCLSJNXcw-&amp;export=download\"></span><span style=\"font-size:9.5pt\"><u></u><u></u></span></p>\n</div>\n</div>\n<p class=\"MsoNormal\">Listen to podcasts? I was recently on the <a href=\"https://www.futuretechpodcast.com/podcasts/digging-up-the-data-your-company-has-needs-and-cant-access-paul-mendoza-sigparser/\" target=\"_blank\">\n<b>FutureTech Podcast</b></a> talking about SigParser and use cases other customers are using it for.\n<u></u><u></u></p>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n</div>\n\n</blockquote></div></div>\n"
}

Learn More About SigParser

Learn more about how SigParser's API automatically parses email bodies and other email content. Try our API for free with no commitment required.

Command Line (Linux or Windows)

Consume SigParser from any shell. Provide it with a JSON file of the email or an EML file or a MSG file and it will return a JSON structured response for the fields listed above. You can also tell it to output to a directory.

Command Line Editor Screenshot

Command Line Editor Help Screenshot

SigParser API called with Python

Example of how to call our assembly in Python. You’ll need to write the JSON out to the input.json file first.

import os
stream = os.popen('SigParserEmailUtils cleanedemail --filename input.json')
output = stream.read()
output

Lambda Deployment Option

AWS Lambda is a great service to deploy SigParser’s email parsing tools to. Each email its own dedicated RAM and CPU, Lambdas are kept warm for around 5 minutes which means the startup time is decreased per email and they scale really well.

Deploying Your Lambda

To configure, create a .NET Core 2.1 (C#/PowerShell) Lambda function. Name doesn’t matter.

Lambda Setup Example 1

In Function code section set the Handler as SigParser.EmailParsing.Lambda::SigParser.EmailParsing.Lambda.Function::GetCleanedEmailAsync

Upload the SigParser.EmailParsing.Utils.Lambda.zip file.

Lambda Setup Example 2

Set the Environment Variable for SigParserLicenseKey to your license Cryptolens license key. Contact us to get that.

Lambda Setup Example 3

Set the Memory to 2048MB of RAM. SigParser needs quite a bit of RAM to run all the machine learning systems quickly.

Lambda Setup Example 4

Click Save and then click Test and use this test email and it should return a JSON result. The first time can be slow but after that it tends to be fast.

{
  "FromEmailAddress": "mary.johnson@fake.com",
  "FromName": "Mary Johsnon",
  "TextBody": null,
  "HtmlBody": "<p>Hi John,<\\/p>\\r\\n\\r\\n<p>Let\\'s get coffee tomorrow.<\\/p>\\r\\n\\r\\n<p>Thanks Mary Johnson<\\/p>"
}

Invoke Lambda Function

RAM Usage Explained

SigParser needs 2048MB of RAM per email to safely execute without running out of RAM when processing emails. The average real human emails needs 962 MB of RAM. The 99th percentile nees 1605MB.

SigParser Email Parser in incredibly CPU intensive. In AWS the more RAM you give a Lambda the more CPU speed AWS gives that Lambda. So having lots of RAM isn’t wasteful since it executes faster.

Mailgun vs SigParser Parsing Libraries

We get compared to Mailgun’s open source email parsing library but these are very different libraries when it comes to what they do and their performance.

SigParser Mailgun
Accuracy
Estimated accuracy for signature line identification
99.9% 92%
Strip Signatures Off Emails
Yes Yes
Support Languages
How many lanauges can it split emails for?
English, German, Spanish, French, Portuguese, Russian, Dutch, Norwegian, Korean, Chinese, Turkish, Swedish, Czech English
Forward Extraction
Capture forwarded messages
Yes No
ML Knowledge
How much machine learning knowledge do you need?
Nothing Some. You'll need to find your own training data too since the 200 emails samples they give you isn't a very robust set.
Deliverables
What do you get?
Linux assembly, Windows assembly, Lambda zip file, Nuget Package Python source code

Learn More About SigParser

Learn more about how SigParser's API automatically parses email bodies and other email content. Try our API for free with no commitment required.