I had some issues with a mailbox quota a few days ago. It uses Microsoft Exchange and have some terrible limitations:
- The e-mail quota is only 1 GB;
- It uses Microsoft Exchange and doesn’t have support for IMAP, POP3 and even SMTP.
- The e-mail provider doesn’t allow me to create filters to auto forward. Yes, it uses Outlook web app, but even when I set up the filter, it doesn’t work.
The inbox has 2769 e-mails. Imagine saving each one of these e-mails OR forwarding one by one. Woking as fast as I can, it takes me a least 10 seconds to forward 1 e-mail. There’s also a catch, each e-mail that I forward fills my “sent mail” box, for someone that doesn’t have any space left, think about how many times it would take to stop forwarding to clean the other folder. Not taking in consideration this cleaning processes, it would take me 8 hours working non-stop.
The Exchange Web Services client library
I decided to google a solution and I although I could’t find many, I found the exchangelib github repository. And wow, that was smooth.
This module provides an well-performing, well-behaving, platform-independent and simple interface for communicating with a Microsoft Exchange 2007-2016 Server or Office365 using Exchange Web Services (EWS). It currently implements autodiscover, and functions for searching, creating, updating, deleting, exporting and uploading calendar, mailbox, task, contact and distribution list items.
The Exchange Web Services client library page.
The page is extremely well documented and very promising. So I decided to take a shot.
Installing exchangelib
Installing it took me less time than forwarding 1 e-mail:
pip install exchangelib
Code language: Bash (bash)
The page also recommends downloading more sttufs: “exchangelib
uses the lxml
package, and pykerberos
to support Kerberos authentication. To be able to install these, you may need to install some additional operating system packages.” So if your are running Ubuntu, one more command:
apt-get install libxml2-dev libxslt-dev libkrb5-dev build-essential libssl-dev libffi-dev python-dev
Code language: Bash (bash)
And that’s all, we can begin working with exchangelib.
The teaser
The github page has a “teaser” that, for a beginner like me was very understandable:
#!/usr/bin/python
from exchangelib import Credentials, Account
credentials = Credentials('john@example.com', 'topsecret')
account = Account('john@example.com', credentials=credentials, autodiscover=True)
for item in account.inbox.all().order_by('-datetime_received')[:100]:
print(item.subject, item.sender, item.datetime_received)
Code language: Python (python)
And well, it didn’t work….
Timezone offset does not match system offset: 0 != 32400. Please, check your config files.. Fallback to UTC
But as I understood, it was a problem related to my TZ on Ubuntu, so I googled it out and found this page on exchangelib issues. I googled a little more and reached this other page that explain us how TZ works on Linux.
I followed its instructions and created a $HOME/.bash_profile using nano with my specs:
TZ='Asia/Tokyo'; export TZ
Code language: JavaScript (javascript)
Then I log out and log in again. Few seconds later and my first script was working.
If you need to find your Timezone, just check the bottom of the page (“Linux / UNIX: TZ Environment Variable“) and you can find further instructions. You can also find a list of tz database time zones on Wikipedia.
Documentation
The github page has more information about how the lib works but I also found some useful informations:
- The page documentation: it is very well documented, quite long
- The source code documentation
- The exchangelib PyPI page documentation: has a lot of examples, I understood the difference of using values() and values_list() there.
My 2 goals
Now that I found the main tool, I need to start crafting the other accessories. I had 2 goals:
- To download and backup the e-mail
- To forward all those e-mails to my personal account. I also want to try to make this automatic from now on.
Although it is quite simple, it would take sometime to create a useful and bugless script. I’m just a beginner on software development and I just created some minor web scraping python scripts until now.
Download and backup
There aren’t many solutions online. Actually I just found 1 and thanks God it worked after some minor changes.
MAROONMED solution
His post title is self explanatory: “Download all emails from Exchange or Office 365 with Python and exchangelib“. It uses exchangelib and mailbox to export the desired folders to a “mbox” file. You can also check his github.
Here is his script:
#!/usr/bin/env python3
import mailbox
import os
import sys
import traceback
from exchangelib import Account, Configuration, Credentials, DELEGATE
USERNAME = ''
PASSWORD = ''
SERVER = 'outlook.office365.com'
ID_FILE = '.read_ids'
def create_mailbox_message(e_msg):
m = mailbox.mboxMessage(e_msg.mime_content)
if e_msg.is_read:
m.set_flags('S')
return m
def get_read_ids():
if os.path.exists(ID_FILE):
with open(ID_FILE, 'r') as f:
return set([s for s in f.read().splitlines() if s])
else:
return set()
def set_read_ids(ids):
with open(ID_FILE, 'w') as f:
for i in ids:
if i:
f.write(i)
f.write(os.linesep)
if __name__ == '__main__':
if len(sys.argv) != 3:
print("Usage: {} folder_name mbox_file".format(sys.argv[0]))
sys.exit()
credentials = Credentials(USERNAME, PASSWORD)
config = Configuration(server=SERVER, credentials=credentials)
account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE)
mbox = mailbox.mbox(sys.argv[2])
mbox.lock()
read_ids_local = get_read_ids()
folder = getattr(account, sys.argv[1], None)
item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey'))
total_items_remote = len(item_ids_remote)
new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local]
read_ids = set()
print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote))
for i, item in enumerate(account.fetch(new_ids), 1):
try:
msg = create_mailbox_message(item)
mbox.add(msg)
mbox.flush()
except Exception as e:
traceback.print_exc()
print("[ERROR] {} {}".format(item.datetime_received, item.subject))
else:
if item.item_id:
read_ids.add(item.item_id)
print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject))
mbox.unlock()
set_read_ids(read_ids_local | read_ids)
Code language: PHP (php)
And now mine, changed. I’ll explain the minor changes later.
#!/usr/bin/env python3
import mailbox
import os
import sys
import traceback
import getpass
from exchangelib import Account, Configuration, Credentials, DELEGATE
USERNAME = 'username'
PASSWORD = 'password'
EMAIL = 'email@email.com'
ID_FILE = '.read_ids'
def create_mailbox_message(e_msg):
m = mailbox.mboxMessage(e_msg.mime_content)
if e_msg.is_read:
m.set_flags('S')
return m
def get_read_ids():
if os.path.exists(ID_FILE):
with open(ID_FILE, 'r') as f:
return set([s for s in f.read().splitlines() if s])
else:
return set()
def set_read_ids(ids):
with open(ID_FILE, 'w') as f:
for i in ids:
if i:
f.write(i)
f.write(os.linesep)
if __name__ == '__main__':
if len(sys.argv) != 3:
print("Usage: {} folder_name mbox_file".format(sys.argv[0]))
sys.exit()
credentials = Credentials(USERNAME, PASSWORD)
# config = Configuration(server=SERVER, credentials=credentials)
# account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE)
account = Account(EMAIL, credentials=credentials, autodiscover=True, access_type=DELEGATE)
mbox = mailbox.mbox(sys.argv[2])
mbox.lock()
read_ids_local = get_read_ids()
folder = getattr(account, sys.argv[1], None)
# item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey'))
item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('id', 'changekey'))
total_items_remote = len(item_ids_remote)
new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local]
read_ids = set()
print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote))
for i, item in enumerate(account.fetch(new_ids), 1):
try:
msg = create_mailbox_message(item)
mbox.add(msg)
mbox.flush()
except Exception as e:
traceback.print_exc()
print("[ERROR] {} {}".format(item.datetime_received, item.subject))
else:
# if item.item_id:
if item.id:
# read_ids.add(item.item_id)
read_ids.add(item.id)
print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject))
mbox.unlock()
set_read_ids(read_ids_local | read_ids)
Code language: Python (python)
My minor modifications
- I didn’t know my server address (yeah, that’s real), but I could make the basic script works using only my credentials setting “autodiscovery” as “True“.
- When I ran the script on the first time, I got an error due to “values_list(‘item_d’, ‘changekey’)“. At first I though it was an issue with the “values_list()“, that’s why I found the other page of documentation explaining the difference between “value()” and “values_list()“. I even tried to do it different loading everything without values_list, but it is a terrible idea when your folder has more than 2000 e-mails.
- Those last line in the end of file had also to be changed modifying “item_id” for “id”.
Forwarding the e-mails
This next step I couldn’t find any other solution already written, so I had to do it myself. I used MAROONMED solution as a foundation and began writing. For now, it’s working, but it lacks refinement and I’ll present it on another time.
Tags: automations e-mail EWS Exchange python