Backing up your Exchange (EWS) using Python

On April 10th, 2021 in Uncategorized

I had some issues with a mailbox quota a few days ago. It uses Microsoft Exchange and have some terrible limitations:

  • The e-mail quota is only 1 GB;
  • It uses Microsoft Exchange and doesn’t have support for IMAP, POP3 and even SMTP.
  • The e-mail provider doesn’t allow me to create filters to auto forward. Yes, it uses Outlook web app, but even when I set up the filter, it doesn’t work.

The inbox has 2769 e-mails. Imagine saving each one of these e-mails OR forwarding one by one. Woking as fast as I can, it takes me a least 10 seconds to forward 1 e-mail. There’s also a catch, each e-mail that I forward fills my “sent mail” box, for someone that doesn’t have any space left, think about how many times it would take to stop forwarding to clean the other folder. Not taking in consideration this cleaning processes, it would take me 8 hours working non-stop.

The Exchange Web Services client library

I decided to google a solution and I although I could’t find many, I found the exchangelib github repository. And wow, that was smooth.

This module provides an well-performing, well-behaving, platform-independent and simple interface for communicating with a Microsoft Exchange 2007-2016 Server or Office365 using Exchange Web Services (EWS). It currently implements autodiscover, and functions for searching, creating, updating, deleting, exporting and uploading calendar, mailbox, task, contact and distribution list items.

The Exchange Web Services client library page.

The page is extremely well documented and very promising. So I decided to take a shot.

Installing exchangelib

Installing it took me less time than forwarding 1 e-mail:

pip install exchangelib
Code language: Bash (bash)

The page also recommends downloading more sttufs: exchangelib uses the lxml package, and pykerberos to support Kerberos authentication. To be able to install these, you may need to install some additional operating system packages.” So if your are running Ubuntu, one more command:

apt-get install libxml2-dev libxslt-dev libkrb5-dev build-essential libssl-dev libffi-dev python-dev
Code language: Bash (bash)

And that’s all, we can begin working with exchangelib.

The teaser

The github page has a “teaser” that, for a beginner like me was very understandable:

#!/usr/bin/python from exchangelib import Credentials, Account credentials = Credentials('john@example.com', 'topsecret') account = Account('john@example.com', credentials=credentials, autodiscover=True) for item in account.inbox.all().order_by('-datetime_received')[:100]: print(item.subject, item.sender, item.datetime_received)
Code language: Python (python)

And well, it didn’t work….

Timezone offset does not match system offset: 0 != 32400. Please, check your config files.. Fallback to UTC

But as I understood, it was a problem related to my TZ on Ubuntu, so I googled it out and found this page on exchangelib issues. I googled a little more and reached this other page that explain us how TZ works on Linux.

I followed its instructions and created a $HOME/.bash_profile using nano with my specs:

TZ='Asia/Tokyo'; export TZ
Code language: JavaScript (javascript)

Then I log out and log in again. Few seconds later and my first script was working.

If you need to find your Timezone, just check the bottom of the page (“Linux / UNIX: TZ Environment Variable“) and you can find further instructions. You can also find a list of tz database time zones on Wikipedia.

Documentation

The github page has more information about how the lib works but I also found some useful informations:

My 2 goals

Now that I found the main tool, I need to start crafting the other accessories. I had 2 goals:

  1. To download and backup the e-mail
  2. To forward all those e-mails to my personal account. I also want to try to make this automatic from now on.

Although it is quite simple, it would take sometime to create a useful and bugless script. I’m just a beginner on software development and I just created some minor web scraping python scripts until now.

Download and backup

There aren’t many solutions online. Actually I just found 1 and thanks God it worked after some minor changes.

MAROONMED solution

His post title is self explanatory: “Download all emails from Exchange or Office 365 with Python and exchangelib“. It uses exchangelib and mailbox to export the desired folders to a “mbox” file. You can also check his github.

Here is his script:

#!/usr/bin/env python3 import mailbox import os import sys import traceback from exchangelib import Account, Configuration, Credentials, DELEGATE USERNAME = '' PASSWORD = '' SERVER = 'outlook.office365.com' ID_FILE = '.read_ids' def create_mailbox_message(e_msg): m = mailbox.mboxMessage(e_msg.mime_content) if e_msg.is_read: m.set_flags('S') return m def get_read_ids(): if os.path.exists(ID_FILE): with open(ID_FILE, 'r') as f: return set([s for s in f.read().splitlines() if s]) else: return set() def set_read_ids(ids): with open(ID_FILE, 'w') as f: for i in ids: if i: f.write(i) f.write(os.linesep) if __name__ == '__main__': if len(sys.argv) != 3: print("Usage: {} folder_name mbox_file".format(sys.argv[0])) sys.exit() credentials = Credentials(USERNAME, PASSWORD) config = Configuration(server=SERVER, credentials=credentials) account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE) mbox = mailbox.mbox(sys.argv[2]) mbox.lock() read_ids_local = get_read_ids() folder = getattr(account, sys.argv[1], None) item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey')) total_items_remote = len(item_ids_remote) new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local] read_ids = set() print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote)) for i, item in enumerate(account.fetch(new_ids), 1): try: msg = create_mailbox_message(item) mbox.add(msg) mbox.flush() except Exception as e: traceback.print_exc() print("[ERROR] {} {}".format(item.datetime_received, item.subject)) else: if item.item_id: read_ids.add(item.item_id) print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject)) mbox.unlock() set_read_ids(read_ids_local | read_ids)
Code language: PHP (php)

And now mine, changed. I’ll explain the minor changes later.

#!/usr/bin/env python3 import mailbox import os import sys import traceback import getpass from exchangelib import Account, Configuration, Credentials, DELEGATE USERNAME = 'username' PASSWORD = 'password' EMAIL = 'email@email.com' ID_FILE = '.read_ids' def create_mailbox_message(e_msg): m = mailbox.mboxMessage(e_msg.mime_content) if e_msg.is_read: m.set_flags('S') return m def get_read_ids(): if os.path.exists(ID_FILE): with open(ID_FILE, 'r') as f: return set([s for s in f.read().splitlines() if s]) else: return set() def set_read_ids(ids): with open(ID_FILE, 'w') as f: for i in ids: if i: f.write(i) f.write(os.linesep) if __name__ == '__main__': if len(sys.argv) != 3: print("Usage: {} folder_name mbox_file".format(sys.argv[0])) sys.exit() credentials = Credentials(USERNAME, PASSWORD) # config = Configuration(server=SERVER, credentials=credentials) # account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE) account = Account(EMAIL, credentials=credentials, autodiscover=True, access_type=DELEGATE) mbox = mailbox.mbox(sys.argv[2]) mbox.lock() read_ids_local = get_read_ids() folder = getattr(account, sys.argv[1], None) # item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey')) item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('id', 'changekey')) total_items_remote = len(item_ids_remote) new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local] read_ids = set() print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote)) for i, item in enumerate(account.fetch(new_ids), 1): try: msg = create_mailbox_message(item) mbox.add(msg) mbox.flush() except Exception as e: traceback.print_exc() print("[ERROR] {} {}".format(item.datetime_received, item.subject)) else: # if item.item_id: if item.id: # read_ids.add(item.item_id) read_ids.add(item.id) print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject)) mbox.unlock() set_read_ids(read_ids_local | read_ids)
Code language: Python (python)
My minor modifications
  1. I didn’t know my server address (yeah, that’s real), but I could make the basic script works using only my credentials setting “autodiscovery” as “True“.
  2. When I ran the script on the first time, I got an error due to “values_list(‘item_d’, ‘changekey’)“. At first I though it was an issue with the “values_list()“, that’s why I found the other page of documentation explaining the difference between “value()” and “values_list()“. I even tried to do it different loading everything without values_list, but it is a terrible idea when your folder has more than 2000 e-mails.
  3. Those last line in the end of file had also to be changed modifying “item_id” for “id”.

Forwarding the e-mails

This next step I couldn’t find any other solution already written, so I had to do it myself. I used MAROONMED solution as a foundation and began writing. For now, it’s working, but it lacks refinement and I’ll present it on another time.

Tags: automations e-mail EWS Exchange python