Wednesday, August 15

Stop Using Python 2: What You Need to Know About Python 3

Though Python 3 was released in 2008, many projects are still stuck on Python 2.

It’s understandable that porting large existing codebases to a new version is a prospect which sends a shiver down many a developer’s spine. But code inevitably needs to be maintained, and so when all the shiny new features that would fix everything are in a new version, is it really worth staying rooted in the past?

We’ll take you through some of the features that Python 2 programs are missing out on, not only from 3.0 but up to the current release (3.7).

Why Python 3 Happened

Before 2008, Python developers had a bit of a headache. The language that started in the 1989 Christmas holidays as the pet project of Guido van Rossum was now growing at a fast pace. Features had been piled on, and the project was now large enough that earlier design decisions were hindering implementation. Because of this, the process of adding new features was becoming an exercise in hacking around the existing code.

The solution was Python 3: the only release that deliberately broke backwards compatibility. At the time, the decision was controversial. Was it acceptable for a publicly used open source project to purposefully break on older code? Despite the backlash, the decision was taken, giving Guido and the developers a one off chance to clean out redundant code, fix common pitfalls and re-architect the language. The aim was that within Python 3 there would be only one obvious way of doing things. It’s testament to the design choices made back then that we’re still on 3.x releases a decade later.

The __future__ is Now

The __future__ import is a slice of time-travelling wizardry which allows you to summon select features from future releases of Python. In fact, the current Python release, 3.7, contains __future__ imports from releases which haven’t yet been written!

Ok fine, so it’s not quite as grandiose as that, a __future__ import is just an explicit indicator of switching on new syntax which is packaged with the current release. We thought we’d mention it because a few of the Python 3 features listed below can be __future__ imported and used in 2.6 and 2.7, which were released to coincide with 3.0 and 3.1 respectively. Having said this, upgrading is, of course, still advised as new features are ‘frozen’ in past releases and will not benefit from the evolution and maintenance of current versions.

Onto what you’re missing out on in Python 3…

Print is a Function

Yes, we know that most people are aware of this, but it’s one of the most used statements by Pythonistas who are starting out. print moved from a keyword to a function, which just makes sense.

This Python 2 code

print "Fresh Hacks Every Day"
print "Foo", "some more text on the same line as foo"

will become the following in Python 3.

print("Fresh Hacks Every Day")
print("Foo", end='')
print("some more text on the same line as foo")

Souped up Unpacking

Here we have a tuple containing some data. In Python 2, we can already unpack into different variables like so:

person = ("Steve", "Hammond", 34, "England", "spaces")
name, surname, age, country, indent_pref = person

But let’s say we only care about the first name and indentation preference. In Python 3, we can now unpack like this:

person = ("Steve", "Hammond", 34, "England", "spaces")
name, *misc, indent_pref = person

# These would also work
name, surname, age, *misc = person
*misc, indent_pref = person

This provides much more flexibility when unpacking — especially handy if dealing with tuples longer than the one in this example.

Unpacking is commonly used in for-loops, especially when using things like zip() or enumerate(). As an example of applying the new syntax, we now have a function, get_people_data(), that returns a list of tuples like the person example above.

for name, *misc, indent_pref in get_people_data():
    if indent_pref is not "spaces":
        print(f"You will feel the full wrath of PEP 8, {name}.")

This works great. But wouldn’t it be nice if we could store the indentation preference in a better way than just a string? In a way similar to an enum in C?

Enums

Enums in Python. What a treat. Due to popular demand, they’ve been in the standard library since 3.4.

Now we can define indentation preference like this:

from enum import Enum

class Indent(Enum):
    SPACES = 'spaces'
    TABS = 'tabs'
    EITHER = 'either'

person = ("Steve", "Hammond", 34, "England", Indent.SPACES)
# or
person = ("Steve", "Hammond", 34, "England", Indent("spaces"))

# Let's try and initialise with an invalid value
person = ("Steve", "Hammond", 34, "England", Indent("invalid value"))
# 'ValueError: 'invalid value' is not a valid Indent

The syntax seems a bit strange, but it can be useful when dealing with numeric constants or initialisation from strings.

Division

A simple but major change: when dividing integers in Python 3 we get true float division by default (dividing two integers in Python 2 always resulted in an integer result).

This is the Python 2 behaviour:

>>> 1 / 3
0
>>> 5 / 2
2

Whereas in Python 3 we get more accuracy:

>>> 1 / 3
0.3333333333333333
>>> 5 / 2
2.5

// can of course be used for floor integer division if this is required.

This change is one that should definitely be noted if you’re porting code from 2 to 3; it’s an easy one to slip under the radar that could cause major issues in program logic.

Chaining Exceptions

It’s very common to catch one exception, then raise another. This could be because your application has a defined set of custom exceptions, or simply because you want to provide more information about what went wrong.

Here we have a program which crudely calculates the number of days needed to be worked to earn a certain proportion of yearly pay.

class InvalidSalaryError(Exception):
    pass

def days_to_earn(annual_pay, amount):
    """Return number of days worked to earn `amount`."""
    try:
        annual_frac = amount / annual_pay
    except ZeroDivisionError:
        raise InvalidSalaryError("Could not calculate number of days")
    return 365 * annual_frac

if __name__ == '__main__':
    print(days_to_earn(0, 4500))
    print(days_to_earn(20000, 4500))

We can see that if an annual pay of zero is specified and a ZeroDivisionError occurs, this is caught, and an InvalidSalaryError is raised.

Let’s try running this with Python 2.

$ python days_calc.py
Traceback (most recent call last):
  File "exception_chaining.py", line 13, in <module>
     print(days_to_earn(0, 4500))
  File "exception_chaining.py", line 9, in days_to_earn
    raise InvalidSalaryError("Could not calculate number of days")
__main__.InvalidSalaryError: Could not calculate number of days

Because we caught the ZeroDivisionError, it got swallowed, so only the InvalidSalaryError traceback is shown.

Now let’s run this with Python 3.

$ python3 days_calc.py
Traceback (most recent call last):
  File "exceptions_chaining.py", line 7, in days_to_earn
    annual_frac = amount / annual_pay
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "exceptions_chaining.py", line 13, in <module>
    print(days_to_earn(0, 4_500))
  File "exceptions_chaining.py", line 9, in days_to_earn
    raise InvalidSalaryError("Could not calculate number of days")
__main__.InvalidSalaryError: Could not calculate number of days

We got a far more detailed error report, explaining what caused the InvalidSalaryError, complete with a traceback for the ZeroDivisionError. This is particularly useful when you’re using code written by someone else, which might not have the most verbose error messages.

We won’t go into it here, but there’s also new raise ... from syntax which allows for explicit exception chaining.

Side note: dealing with large numbers in Python

In the above program we had to deal with ints in the tens of thousands. Writing long numbers like 20000 or 0b001110100010 is a recipe for screen squinting. How about separating them like this: 20_000, 0b_0011_1010_0010? This is another way Python 3 will make your life easier – optional underscores in numeric literals, which can help to make things more readable.

Unicode and Bytes

In Python 2 the concept of bytes and strings were pretty much interchangeable, and both came under the str type. This led to some very nasty conversion consequences and unpredictable behaviour. The headline to remember is that in Python 3 all strings are unicode. A distinction was very deliberately created between text and bytes, offering a much more defined way of working.

In the below examples, we test the type of a bytestring, ‘normal’ string and unicode string in the interpreters of Python 2 and 3.

Python 2.7:

>>> type(b'foo')
<type 'str'> # In Python 2, a bytestring is a normal string!
>>> type('foo')
<type 'str'> # The same bytes type as above
>>> type(u'foo')
<type 'unicode'>

Python 3.7:

>>> type(b'foo')
<class 'bytes'> # In Python 3, this has its own bytes type
>>> type('foo')
<class 'str'> # In Python 3, 'str' means unicode
>>> type(u'foo')
<class 'str'> # It's a normal string

This means that dealing with encodings is much clearer in Python 3, even if it comes at the cost of slipping in a few more .encode()s. Chances are that when your program communicates with anything in the outside world, such as files or sockets, you’ll have to encode and decode your data.

As an example, if you’re using pyserial to read/write to a serial device, you’ll need to explicitly encode and decode your messages.

import serial

PORT = '/dev/ttyACM0'
BAUD = 115200
ENCODING = 'utf-8'

if __name__ == '__main__':
    ser = serial.Serial(port=PORT, baudrate=BAUD)

    ser.write('hello'.encode(ENCODING))
    response_bytes = ser.readline()
    response_str = response_bytes.decode(ENCODING)

String Formatting

Formatting strings in Python 2 was performed in a similar style to C:

author = "Guido van Rossum"
date = 1989
foo = "%s started Python in %d" % (author, date)

# Guido van Rossum started Python in 1989

The docs explicitly state that this way of formatting “exhibits a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly)”. It’s also inflexible, and starts to become ugly to look at when dealing with long strings.

We now have two new ways of formatting strings. One is ultra-convenient, and one is ultra-powerful.

Formatted String Literals

Often referred to as ‘f-strings’, these provide an incredibly easy way to reference any variable available in the current scope.

Checkout how intuitive and readable this code is:

author = "Guido van Rossum"
date = 1989
foo = f"{author} started Python in {date}"

# Guido van Rossum started Python in 1989

Anything inside the curly braces will be evaluated as an expression, so you can put small logic or statements inside the string if you so desire. It’s probably not very readable to contain program logic within a string, but if it’s just getting logged to provide extra debug information then it’s really handy.

author = "Guido van Rossum"
date = 1989 
foo = f"{author.split()[0]} started Python {2000 - date} years before the turn of the century"

# Guido started Python 11 years before the turn of the century

The .format Method

In Python 3, every string has a .format method, which provides a dazzling array of options for formatting, covering the overwhelming majority of use cases. We won’t go into detail here as it is backported to 2.6 and 2.7, so won’t be a big upgrade pull for most Python 2 users.

Here’s one of our favourite guides for reference.

Imports

Let’s say that you’re using Python 2, and have the following file hierarchy:

pkg
├──service.py
└──misc.py
run.py
utils.py

run.py simply imports and calls something from the service module in the pkg package.

But service.py relies on functions contained in utils.py. So at the top of service.py we have this statement.

import utils

Seems fairly unassuming, and everything will work just fine. But what if our folder structure now changes, with pkg acquiring a new utils module?

pkg
├──service.py
├──misc.py
└──utils.py
run.py
utils.py

Time for confusion: our code now switches and uses the utils.py file within pkg. Things would get even more messy if we happened to have a library installed named utils. This approach isn’t defined enough, and consistently led to unpredictable behaviour in Python 2.

Python 3 to the rescue: it’s no longer supported syntax if it’s ambiguous whether the import is supposed to be absolute or relative. The top of service.py could become any of the following options depending on what’s required.

# To import from within pkg, either use relative
from . import utils
# or absolute
from pkg import utils

# To import from the same level as run.py, use absolute
import utils

This feature might seem like a bit of a nicety in this example, but when dealing with big codebases containing large package/import hierarchies you’ll be glad of it.

Other

There are many, many other features which offer improvements and entirely new features over Python 2, even if some are only useful in niche areas. Here are just a few:

  • The addition of the asyncio library makes writing asynchronous programs a breeze. A must in modern programming.
  • Other major standard library additions (like concurrent.futures, ipaddress, and pathlib)
  • Accessing a parent class in Python 2 is needlessly heavy on syntax. In Python 3, super() becomes even more magical.
  • Many builtins such as zip(), map(), and filter() now return iterators instead of lists. In many cases, this will save significant amounts of memory without you even knowing.

Tools

If you decide that porting code from Python 2 to 3 is the way to go, there are some existing tools to make your life easier.

  • 2to3 – the official automated code translation tool. Yes, it will port the code for you! Not guaranteed to catch everything, but does a lot of the tedious syntactic fiddling.
  • caniusepython3 – a nifty utility for detecting which of your project dependencies are stopping you from making the leap.
  • tox automates and streamlines the testing of Python code. It allows you to easily test your code on multiple versions of Python, which is fantastic for any project in general, but particularly comes in handy when you’re testing the success of your newly ported codebase on different versions.

Conclusion

There are countless reasons to upgrade to Python 3 – not just because of the convenience of new features, but because random, obscure bugs in the language get fixed regularly by Python developers, in Python 3 alone. Only the most robustly tested, never-need-to-change codebases have an excuse to remain at Python 2. Everything else should be enjoying the fantastic hard work put in by Python developers to make the language what it is today.

No comments:

Post a Comment