Overriding Default Werkzeug Exceptions in Flask

Let’s play a game here. What HTTP code is this exception:

"message": "The browser (or proxy) sent a request that this server could not understand."

No no, you don’t look at the code in response! That’s cheating! This is actually a default Werkzeug description for 400 code. No shit. I thought something is bad with my headers or encryption, but I would never guess simple Bad request from this message. You could use a custom exception of course, the problem is, that the very useful abort(400) object (it’s an Aborter in disguise) would stick with the default exception anyway.

Let’s fix it, shall we?

There may be several possible ways of fixing that, but what I’m gonna do is just update abort.mapping. Create a separate module for your custom HTTP exceptions custom_http_exceptions.py and put a couple of overridden exceptions there (don’t forget to import abort we’ll be needing that in a moment):

from flask import abort
from werkzeug.exceptions import HTTPException

class BadRequest(HTTPException):
    code = 400
    description = 'Bad request.'

class NotFound(HTTPException):
    code = 404
    description = 'Resource not found.'

These are perfectly functional, but we still need to add it to the default abort mapping:

    400: BadRequest,
    404: NotFound

Note that I import abort object from flask, not flask-restful, only the former is an Aborter object with mapping and other bells and whistles, the latter is just a function.

Now just import this module with * to your app Flask module (where you declare and run your Flask app) or someplace it would have a similar effect on runtime.

Note that you also should have the following line in your config because of this issue:

ERROR_404_HELP = False

I’m not sure why this awkward and undocumented constant isn’t False by default. I opened an issue on GitHub, but no one seems to care.

Remote Debugging with PyCharm

I’m working in a project now, that requires a certain (server) environment to run, hence it is developed on my local machine and then gets deployed on remote server. I thought I’m gonna say bye bye to my favorite PyCharm feature, namely the debugger, but to my surprise remote debugging has been supported for years now. It took some time to figure out (tutorials online are a bit ambiguous), so here is a short report on my findings.

For the sake of this tutorial let’s assume the following:

  • Remote host: foo_host.io
  • Remote user: foo_usr (/home/foo_usr/)
  • Local user: bar_usr (/home/bar_usr/)
  • Path to the project on the local machine: /home/bar_usr/proj

Here goes the step-by-step how-to:

  1. First we need to set up remote deploy, if you haven’t done so already. Go to Tools → Deployment → Configuration. And set up access to your remote server via SSH. I’d use:
    • Type: SFTP
    • SFTP host: foo_host.io (don’t forget to test the connection before applying)
    • Port: 22 (obvously)
    • Root path: /home/foo_usr
    • User name: foo_usr
    • Auth type: Key pair (OpenSSH or PyTTY)
    • Private key file: /home/bar_usr/.ssh/id_rsa (you’d need to generate the key and ssh-copy-id it to the remote machine, which is outside of the scope of this tutorial).
  2. Go to Mappings tab and add Deployment path on server (pehaps, the name of your project)
  3. Now under Tools → Deployment you have an option to deploy your code to remote server. These first three steps could be replaced with simple Git repository on the side of the server, however I sometimes prefer this way.
  4. Now when you have the deployment set up you can go Tools → Deployment → Upload to ..., note however, that it deploys only the file you have opened or the directory you selected in the project view, so if you need to sync the whole project just select your project root.
  5. I use virtualenv, so at this step I need to ssh into the remote machine and set up virtualenv in your project directory (/home/foo_usr/test/.env), which is outside of the scope of this tutorial. If you’re planning on using the global Pyhton interpreter, just skip this step.
  6. Now let’s go File → Settings → Project ... → Project Interpreter. Using gear button select Add Remote. The following dialog window would let you set up a remote interpreter over SSH (including remote .env), Vagrant or using deployment configuration you have set up previously. For the sake of this tutorial I’m going to put something like that there (using SSH of course):
    • Host: foo_host.io
    • Port: 22 (which is there by default)
    • User name: foo_usr
    • Auth type: Key pair (OpenSSH or PyTTY)
    • Private key file: /home/bar_usr/.ssh/id_rsa
    • Python interpreter path: /home/foo_usr/proj/.env/bin/python
  7. If you set up everything correctly, it should list all the packages installed in your remote environment (if any) and select this interpreter for your project.
  8. Now let’s do the last, but the most important step: configure debugging. Go to Edit Configurations… menu and set things up accordingly. For our hypothetical project I will use the following:
    • Script: proj/run.py (or something along these lines)
    • Python interpreter: just select the remote interpreter you have set up earlier.
    • Working directory: /home/bar_usr/proj/ (note that this is working directory on local machine)
    • Path mapping: create a mapping along the lines of /home/bar_usr/proj = /home/foo_usr/proj (although this seems pretty easy, it may get tricky sometimes, when you forget about mappings and move the projects around, be careful).
That’s it. Now we should have a more or less working configuration that you could use both for debugging and running your project. Don’t forget to update/redeploy your project before running as the versions may get async and PyCharm would get all whiny about missing files.

My Take on Yandex Pre-interview Python Assignment

I’ve applied for a junior Python position at Russian internet giant Yandex (very similar to Google). And although my application has been rejected, due to lack of experience, I think their little pre-interview test and my take on that may be of interest to any inquisitive pythonista. Note, that this has never been properly translated into English before, so this is probably exclusive in that regard.

Assignment I

There are two lists of different length. The first one contains keys, the second – values. Write a function, that would create a dict out of these lists. If the key doesn’t have a value – it should equal None, if the value doesn’t have a key, it should be omitted.

def get_dict(list1, list2):
    ret = dict(map(None, list1, list2))
    if ret.get(None, False):
    return ret

Assignment II

Login should start with latin symbol, contain latin symbols, digits, dots and hyphens, but end only with a latin symbol or a digit. Minimum length is 1 symbol, maximum – 20 symbols. Write a function that checks strings for correspondence with these rules. Think of several methods of solving this problem and compare them.

import re
import time

def check1(login):
    ret = False
    if re.match('^[a-zA-Z][a-zA-Z0-9\-\.]{0,19}(?<![\-\.])$', login):
        ret = True
    return ret

def check2(login):
    ret = False
    if (len(login) >= 1 or len(login) <= 20) and login[0].isalpha() and (login[-1].isalpha() or login[-1].isdigit()):
        for a in login[1:-1]:
            if a.isalpha() or a.isdigit() or a == '-' or a == '.':
                ret = True
    return ret

def compare(login):
    tm = time.time()
    print(time.time() - tm)
    tm = time.time()
    print(time.time() - tm)

Assignment III

There are two tables users and messages (I changed names and messages to non-Cyrillic):

UID name
1 John Doe
2 Natalie Knaph
3 Johnatan Yozo
UID msg
1 Hello, John!
3 Send me the card, quickly.
3 I’m waiting on the corner of 5th and Lafayette
1 This is me again. Please message me more often.

Create a SQL query, that would return two fields: “User name” and “Total amount of messages”.

SELECT users.name AS "User name",count(*) AS "Total amount of messages" 
FROM users 
JOIN messages ON users.uid = messages.uid 
GROUP BY users.uid

Assignment IV

Suppose you have a generic access.log. How to get 10 most frequent IP-addresses using standard terminal tools? How to do that with Python?

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' access.log | sort -n | uniq -c | sort -n -r | head -10

import sys
import re

all = re.findall("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}", open(sys.argv[1], 'r').read())
srt = sorted(all, key=all.count, reverse=True)
unq = []
for m in srt:
    if not m in unq:
print unq[0:10]

If you can think of a better way to solve any of these, let me know.

Humble Collection of Python Sphinx Gotchas: Part II

Gotcha 1: Release and Version

Sphinx makes a distinction between the release code and the version of the application. The idea is that it should look this way:

version = "4.0.4"
release = "4.0.4.rc.1"

Most project use a much simpler versioning convention, so they would probably do something like this:

version = "4.0.4"
release = "4.0.4"

I’d been doing this myself for some time, until I realized the conf.py file is a simple Python file (no shit!) and it is perfectly fine to do something like this:

version = "4.0.4"
release = version

Yeah, kinda obvious I know. I however missed this (though I’ve been putting much more complex code to conf.py) and some people did either. So, I’ll just leave it here.

Gotcha 2: Download as PDF

Sometimes a PDF download should be provided along with the hosted HTML version. It looks good and people can get a well-formatted file to use locally as they are trying to work with your API or product. In short: it could be done easily, using Python. I’m really pissed off at people, who know Python or Bash and they still keep asking, whether there is an automatic way of doing that. Well, it doesn’t get more automatic than that:

# generating latex and pdf
make latexpdf

# generating html
find 'index.rst' -print -exec sed -i.bak "s/.. &//g" {} \;
find 'index.rst' -print -exec sed -i.bak "s/TARGET/$UPTARGET/g" {} \;

VERSION="$(grep -F -m 1 'version = ' conf.py)";

sphinx-build -b html . ../$TARGET/$VERSION/
cp latex/$UPTARGET.pdf ../$TARGET/$VERSION/

Note, that there should be the following line, inserted into the index.rst in the example above:

.. &  `Download as PDF <TARGET.pdf>`_

Where .. is the comment syntax, & is here to distinguish the line from usual comments and TARGET is replaced with $UPTARGET which is the upper case version of the project name and the default name of the .tex and .pdf files. It creates a relative link to the .pdf file, which is then copied to the exact same folder, where HTML output is located. I’m not going explain much about the variables, as their sources may differ. In my work I use a python script, with exact same principle (I figured bash example would be more universal) and it gets values of $TARGET, $UPTARGET and $VERSION from a JSON file with a list of targets (more on that in the next example). In the example above, I’m stripping values off the conf.py file. In fact you can use whatever input you wish, even pass the values as arguments. What I was trying to illustrate is the concept itself.

Gotcha 3: Using Scripting to Organize the Sphinx Project as a Multiple Project Knowledge Base

Some of the companies, I’ve been working at had this huge array of active projects, that they wanted to present as a single site, or the whole variety of sites with the same theme, or the single site with PDF version for every first level subsection. Basically they wanted me to create a Sphinx-based knowledge base. Using a simple Python or Bash script there are ways to organize your project any way you want (we’ll use Python this time as it’s closer to what I’ve been using). We’re going to create a site that automatically builds PDF version for each first level subsection (project) and puts it alongside the subsection’s index.html. Basically this is a bit more complex variant of the previous example.

Let’s imagine we have a single Sphinx project with a couple first level sections corresponding to company’s projects, for example: Foo and Bar (give me that medal for originality, yeah). Basically, your folder structure will look like this:

|  index.rst
|_ Foo
|  |_1.0.0
|    |_index.rst
|  |_1.1.0
|    |_index.rst
|_ Bar

Yeah, we also have versions. I use the following script for the projects with such layout. Don’t worry, it only looks kinda big. The script is rather simple. Also, I’ve commented the hell out of it so that you could figure it all out. Note, that you also need to create a targets.json file in the root of your project, containing the following lines (assuming we’re using the structure we agreed on in the beginning):

  "foo" : "Foo Foo",
  "bar" : "Barrington"

The file will tell the script of full project names and how they correspond to target names (folder names) in the structure. Also you will need to have a temp.py file containing only the info we need for PDF building with most of the target names and version numbers represented as variables for injection (yeah, I know this is hacky, but did’t want to bother with imports, dependencies etc). First of all it should have $VRSN tags:

# The short X.Y version.
version = '&VRSN'
# The full version, including alpha/beta/rc tags.
release = '&VRSN'

It should also have tags in the LaTeX part of the settings:

latex_documents = [
  ('index', '&TRGT.tex', u'&UPTRGT',
   u'ACME', 'manual'),

Other than that temp.py may resemble your usual conf.py. The reason is that we use conf.py for HTML and it has preset version and project name values for the project as the whole. So we better distinguish between the file for injections and the main configuration file, so that they don’t mess with each other. Note that if you’ll need to add some additional parameters or a preamble to the LaTeX output, you should do that in temp.py as conf.py is not used for building PDFs at all.

If we prepare the project this way, the script should build PDF’s for every subproject and put them to the subproject’s HTML root. Ideally the HTML version could also be built separately for every subproject (for the right project name/version to appear for every subproject). This script is more of a proof of concept rather than out-of-the-box solution. However if you now understand Sphinx’s capability to extension and automation, you may create projects of any complexity yourself.