@wswld

Behind the schedule since 1989

My Take on Yandex Pre-interview Python Assignment

I’ve applied for a junior Python position at Russian internet giant Yandex (very similar to Google). And although my application has been rejected, due to lack of experience, I think their little pre-interview test and my take on that may be of interest to any inquisitive pythonista. Note, that this has never been properly translated into English before, so this is probably exclusive in that regard.

Assignment I

There are two lists of different length. The first one contains keys, the second - values. Write a function, that would create a dict out of these lists. If the key doesn’t have a value - it should equal None, if the value doesn’t have a key, it should be omitted.

def get_dict(list1, list2):
    ret = dict(map(None, list1, list2))
    if ret.get(None, False):
        ret.__delitem__(None)
    return ret

Assignment II

Login should start with latin symbol, contain latin symbols, digits, dots and hyphens, but end only with a latin symbol or a digit. Minimum length is 1 symbol, maximum - 20 symbols. Write a function that checks strings for correspondence with these rules. Think of several methods of solving this problem and compare them.

import re
import time
def check1(login):
    ret = False
    if re.match('^[a-zA-Z][a-zA-Z0-9\-\.]{0,19}(?<;![\-\.])$', login):
        ret = True
    return ret
def check2(login):
    ret = False
    if (len(login) >= 1 or len(login) <;= 20) and login[0].isalpha() and (login[-1].isalpha() or login[-1].isdigit()):
        for a in login[1:-1]:
            if a.isalpha() or a.isdigit() or a == '-' or a == '.':
                ret = True
    return ret
def compare(login):
    tm = time.time()
    check1(login)
    print(time.time() - tm)
    tm = time.time()
    check2(login)
    print(time.time() - tm)

Assignment III

There are two tables users and messages (I changed names and messages to non-Cyrillic):

users
UID name
1 Greg Laplante
2 Luisa Portillo
3 Austin Hunt
messages
UID msg
1 Hello, Greg!
3 Send me the card, quickly.
3 I'm waiting on the corner of 5th and Lafayette
1 This is me again. Please message me more often.

Create a SQL query, that would return two fields: User name and Total amount of messages.

SELECT users.name AS "User name",count(*) AS "Total amount of messages"
FROM users
JOIN messages ON users.uid = messages.uid
GROUP BY users.uid

Assignment IV

Suppose you have a generic access.log. How to get 10 most frequent IP-addresses using standard terminal tools? How to do that with Python?

# BASH:
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' access.log | sort -n | uniq -c | sort -n -r | head -10
# PYTHON:
import sys
import re
all = re.findall("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}", open(sys.argv[1], 'r').read())
srt = sorted(all, key=all.count, reverse=True)
unq = []
for m in srt:
    if not m in unq:
        unq.append(m)
print unq[0:10]

If you can think of a better way to solve any of these, let me know.