Backing up your Exchange (EWS) using Python

On April 10th, 2021 in Uncategorized

I had some issues with a mailbox quota a few days ago. It uses Microsoft Exchange and have some terrible limitations:

  • The e-mail quota is only 1 GB;
  • It uses Microsoft Exchange and doesn’t have support for IMAP, POP3 and even SMTP.
  • The e-mail provider doesn’t allow me to create filters to auto forward. Yes, it uses Outlook web app, but even when I set up the filter, it doesn’t work.

The inbox has 2769 e-mails. Imagine saving each one of these e-mails OR forwarding one by one. Woking as fast as I can, it takes me a least 10 seconds to forward 1 e-mail. There’s also a catch, each e-mail that I forward fills my “sent mail” box, for someone that doesn’t have any space left, think about how many times it would take to stop forwarding to clean the other folder. Not taking in consideration this cleaning processes, it would take me 8 hours working non-stop.

The Exchange Web Services client library

I decided to google a solution and I although I could’t find many, I found the exchangelib github repository. And wow, that was smooth.

This module provides an well-performing, well-behaving, platform-independent and simple interface for communicating with a Microsoft Exchange 2007-2016 Server or Office365 using Exchange Web Services (EWS). It currently implements autodiscover, and functions for searching, creating, updating, deleting, exporting and uploading calendar, mailbox, task, contact and distribution list items.

The Exchange Web Services client library page.

The page is extremely well documented and very promising. So I decided to take a shot.

Installing exchangelib

Installing it took me less time than forwarding 1 e-mail:

pip install exchangelib
Code language: Bash (bash)

The page also recommends downloading more sttufs: exchangelib uses the lxml package, and pykerberos to support Kerberos authentication. To be able to install these, you may need to install some additional operating system packages.” So if your are running Ubuntu, one more command:

apt-get install libxml2-dev libxslt-dev libkrb5-dev build-essential libssl-dev libffi-dev python-dev
Code language: Bash (bash)

And that’s all, we can begin working with exchangelib.

The teaser

The github page has a “teaser” that, for a beginner like me was very understandable:

#!/usr/bin/python from exchangelib import Credentials, Account credentials = Credentials('john@example.com', 'topsecret') account = Account('john@example.com', credentials=credentials, autodiscover=True) for item in account.inbox.all().order_by('-datetime_received')[:100]: print(item.subject, item.sender, item.datetime_received)
Code language: Python (python)

And well, it didn’t work….

Timezone offset does not match system offset: 0 != 32400. Please, check your config files.. Fallback to UTC

But as I understood, it was a problem related to my TZ on Ubuntu, so I googled it out and found this page on exchangelib issues. I googled a little more and reached this other page that explain us how TZ works on Linux.

I followed its instructions and created a $HOME/.bash_profile using nano with my specs:

TZ='Asia/Tokyo'; export TZ
Code language: JavaScript (javascript)

Then I log out and log in again. Few seconds later and my first script was working.

If you need to find your Timezone, just check the bottom of the page (“Linux / UNIX: TZ Environment Variable“) and you can find further instructions. You can also find a list of tz database time zones on Wikipedia.

Documentation

The github page has more information about how the lib works but I also found some useful informations:

My 2 goals

Now that I found the main tool, I need to start crafting the other accessories. I had 2 goals:

  1. To download and backup the e-mail
  2. To forward all those e-mails to my personal account. I also want to try to make this automatic from now on.

Although it is quite simple, it would take sometime to create a useful and bugless script. I’m just a beginner on software development and I just created some minor web scraping python scripts until now.

Download and backup

There aren’t many solutions online. Actually I just found 1 and thanks God it worked after some minor changes.

MAROONMED solution

His post title is self explanatory: “Download all emails from Exchange or Office 365 with Python and exchangelib“. It uses exchangelib and mailbox to export the desired folders to a “mbox” file. You can also check his github.

Here is his script:

#!/usr/bin/env python3 import mailbox import os import sys import traceback from exchangelib import Account, Configuration, Credentials, DELEGATE USERNAME = '' PASSWORD = '' SERVER = 'outlook.office365.com' ID_FILE = '.read_ids' def create_mailbox_message(e_msg): m = mailbox.mboxMessage(e_msg.mime_content) if e_msg.is_read: m.set_flags('S') return m def get_read_ids(): if os.path.exists(ID_FILE): with open(ID_FILE, 'r') as f: return set([s for s in f.read().splitlines() if s]) else: return set() def set_read_ids(ids): with open(ID_FILE, 'w') as f: for i in ids: if i: f.write(i) f.write(os.linesep) if __name__ == '__main__': if len(sys.argv) != 3: print("Usage: {} folder_name mbox_file".format(sys.argv[0])) sys.exit() credentials = Credentials(USERNAME, PASSWORD) config = Configuration(server=SERVER, credentials=credentials) account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE) mbox = mailbox.mbox(sys.argv[2]) mbox.lock() read_ids_local = get_read_ids() folder = getattr(account, sys.argv[1], None) item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey')) total_items_remote = len(item_ids_remote) new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local] read_ids = set() print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote)) for i, item in enumerate(account.fetch(new_ids), 1): try: msg = create_mailbox_message(item) mbox.add(msg) mbox.flush() except Exception as e: traceback.print_exc() print("[ERROR] {} {}".format(item.datetime_received, item.subject)) else: if item.item_id: read_ids.add(item.item_id) print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject)) mbox.unlock() set_read_ids(read_ids_local | read_ids)
Code language: PHP (php)

And now mine, changed. I’ll explain the minor changes later.

#!/usr/bin/env python3 import mailbox import os import sys import traceback import getpass from exchangelib import Account, Configuration, Credentials, DELEGATE USERNAME = 'username' PASSWORD = 'password' EMAIL = 'email@email.com' ID_FILE = '.read_ids' def create_mailbox_message(e_msg): m = mailbox.mboxMessage(e_msg.mime_content) if e_msg.is_read: m.set_flags('S') return m def get_read_ids(): if os.path.exists(ID_FILE): with open(ID_FILE, 'r') as f: return set([s for s in f.read().splitlines() if s]) else: return set() def set_read_ids(ids): with open(ID_FILE, 'w') as f: for i in ids: if i: f.write(i) f.write(os.linesep) if __name__ == '__main__': if len(sys.argv) != 3: print("Usage: {} folder_name mbox_file".format(sys.argv[0])) sys.exit() credentials = Credentials(USERNAME, PASSWORD) # config = Configuration(server=SERVER, credentials=credentials) # account = Account(primary_smtp_address=USERNAME, config=config, autodiscover=False, access_type=DELEGATE) account = Account(EMAIL, credentials=credentials, autodiscover=True, access_type=DELEGATE) mbox = mailbox.mbox(sys.argv[2]) mbox.lock() read_ids_local = get_read_ids() folder = getattr(account, sys.argv[1], None) # item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('item_id', 'changekey')) item_ids_remote = list(folder.all().order_by('-datetime_received').values_list('id', 'changekey')) total_items_remote = len(item_ids_remote) new_ids = [x for x in item_ids_remote if x[0] not in read_ids_local] read_ids = set() print("Total items in folder {}: {}".format(sys.argv[1], total_items_remote)) for i, item in enumerate(account.fetch(new_ids), 1): try: msg = create_mailbox_message(item) mbox.add(msg) mbox.flush() except Exception as e: traceback.print_exc() print("[ERROR] {} {}".format(item.datetime_received, item.subject)) else: # if item.item_id: if item.id: # read_ids.add(item.item_id) read_ids.add(item.id) print("[{}/{}] {} {}".format(i, len(new_ids), str(item.datetime_received), item.subject)) mbox.unlock() set_read_ids(read_ids_local | read_ids)
Code language: Python (python)
My minor modifications
  1. I didn’t know my server address (yeah, that’s real), but I could make the basic script works using only my credentials setting “autodiscovery” as “True“.
  2. When I ran the script on the first time, I got an error due to “values_list(‘item_d’, ‘changekey’)“. At first I though it was an issue with the “values_list()“, that’s why I found the other page of documentation explaining the difference between “value()” and “values_list()“. I even tried to do it different loading everything without values_list, but it is a terrible idea when your folder has more than 2000 e-mails.
  3. Those last line in the end of file had also to be changed modifying “item_id” for “id”.

Forwarding the e-mails

This next step I couldn’t find any other solution already written, so I had to do it myself. I used MAROONMED solution as a foundation and began writing. For now, it’s working, but it lacks refinement and I’ll present it on another time.

Tags: automations e-mail EWS Exchange python

Run apps as a service on Ubuntu

On April 3rd, 2021 in Tech

If you have a script (jar, py, or whatever) or a software that you need to run as a service and also if you need it to start automatically if/when system restarts, you can follow the instructions bellow.

I’ll show 2 examples of my own: Tabula and Archivebox. They all run on my Ubuntu Server VM.

Step 1 – Create a service

Create a service at “/etc/systemd/system”. You name it and use the text editor of your preference.

I’m using “nano” and named them “tabula.service” and “archivebox.service“.

For Tabula

sudo nano /etc/systemd/system/tabula.service

Edit your file. You can use my code bellow, changing as necessary.

Note that lines beginning with a # are comments and your computer won’t execute them. You can delete them if you like. I also kept how I used in my case.

[Unit] # Add the description of your service # Description=Tabula Description=YOUR_DESCRIPTION [Service] # Change this to your workspace (where your script will run). I decided to keep my Tabula jar file on my home directory. # WorkingDirectory=/home/fsugi/tabula WorkingDirectory=PATH_TO_WORKING_DIRECTORY # Path to executable. Executable is a bash script which calls jar file. # NOTE: bash script usually ends with ".sh" but I didn't do that. That's why my script example is "tabula" # ExecStart=/home/fsugi/tabula/tabula ExecStart=PATH_TO_SCRIPT # Other options that you can change, if necessary. I suggest you keep as it. SuccessExitStatus=143 TimeoutStopSec=10 Restart=on-failure RestartSec=30 [Install] WantedBy=multi-user.target
Code language: PHP (php)

For Archivebox

sudo nano /etc/systemd/system/archivebox.service

Edit your file. You can use my code bellow, changing as necessary.

Note that lines beginning with a # are comments and your computer won’t execute them. You can delete them if you like. I also kept how I used in my case.

[Unit] # Add the description of your service # Description=Archivebox service Description=YOUR_DESCRIPTION [Service] # If you need to run the service as a user, you must define them. If none is declared, root is the default. # User=fsugi # Group=fsugi User=USER Group=GROUP # Change this to your workspace (where your script will run). I decided to keep my Tabula jar file on my home directory. # WorkingDirectory=/home/fsugi/archivebox WorkingDirectory=PATH_TO_WORKING_DIRECTORY # Path to executable. Executable is a bash script which calls jar file. # NOTE: bash script usually ends with ".sh" but I didn't do that. That's why my script example is "archivebox" # ExecStart=/home/fsugi/archivebox/archivebox ExecStart=PATH_TO_SCRIPT # Other options that you can change, if necessary. I suggest you keep as it. SuccessExitStatus=143 TimeoutStopSec=10 Restart=on-failure RestartSec=30 [Install] WantedBy=multi-user.target
Code language: PHP (php)

Step 2 – Create bash script to call your service

You must create now a bash script.

I’m using again “nano” as a editor. Notice that I’m not using “sudo” and I’m creating the script where I defined on the “.service” above.

For Tabula

nano /home/fsugi/tabula/tabula

Edit your file. You can use my code bellow, changing as necessary.

Note that lines beginning with a # are comments and your computer won’t execute them. You can delete them if you like. I also kept how I used in my case.

#!/bin/sh # This is the command that runs my jar file. Notice that I wrote the complete path to run "java" binary. The other atributes are parameters defined for running Tabula /usr/bin/java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=1111 -jar /home/fsugi/tabula/tabula.jar
Code language: PHP (php)

For Archivebox

nano /home/fsugi/archivebox/archivebox

Edit your file. You can use my code bellow, changing as necessary.

Note that lines beginning with a # are comments and your computer won’t execute them. You can delete them if you like. I also kept how I used in my case.

#!/bin/sh # Notice that I wrote the complete path to run the "archivebox" binary. The other atributes are parameters defined for running the Archivebox. /usr/bin/archivebox server 0.0.0.0:7000
Code language: PHP (php)

Step 3 – Start service

Every time that you change a service, you must first reload them.

sudo systemctl daemon-reload

To test (or start) your service (I’m using tabula as example):

sudo systemctl start tabula.service
Code language: CSS (css)

To stop your service:

sudo systemctl stop tabula.service
Code language: CSS (css)

To enable your service to automatically load on start-up:

sudo systemctl enable tabula.service
Code language: CSS (css)

To disable your service to automatically load on start-up:

sudo systemctl disable tabula.service
Code language: CSS (css)

To check the status of your service:

sudo systemctl status tabula.service
Code language: CSS (css)

It’s always good to check the status of your service while setting it up to identify errors.

Additional – Logging

If you want to check all the log for your service (I’m using the tabula.service):

sudo journalctl --unit=tabula.service

You can tail the live log using the -f option:

sudo journalctl -f -u tabula.service
Code language: CSS (css)

Use -n <# of lines> to view specified number of lines of log

sudo journalctl -f -n 1000 -u tabula.service
Code language: CSS (css)
Tags: automations digital life