Though Python 3 was released in 2008, many projects are still stuck on Python 2.
It’s understandable that porting large existing codebases to a new version is a prospect which sends a shiver down many a developer’s spine. But code inevitably needs to be maintained, and so when all the shiny new features that would fix everything are in a new version, is it really worth staying rooted in the past?
We’ll take you through some of the features that Python 2 programs are missing out on, not only from 3.0 but up to the current release (3.7).
Why Python 3 Happened
Before 2008, Python developers had a bit of a headache. The language that started in the 1989 Christmas holidays as the pet project of Guido van Rossum was now growing at a fast pace. Features had been piled on, and the project was now large enough that earlier design decisions were hindering implementation. Because of this, the process of adding new features was becoming an exercise in hacking around the existing code.
The solution was Python 3: the only release that deliberately broke backwards compatibility. At the time, the decision was controversial. Was it acceptable for a publicly used open source project to purposefully break on older code? Despite the backlash, the decision was taken, giving Guido and the developers a one off chance to clean out redundant code, fix common pitfalls and re-architect the language. The aim was that within Python 3 there would be only one obvious way of doing things. It’s testament to the design choices made back then that we’re still on 3.x releases a decade later.
The __future__ is Now
The __future__
import is a slice of time-travelling wizardry which allows you to summon select features from future releases of Python. In fact, the current Python release, 3.7, contains __future__
imports from releases which haven’t yet been written!
Ok fine, so it’s not quite as grandiose as that, a __future__
import is just an explicit indicator of switching on new syntax which is packaged with the current release. We thought we’d mention it because a few of the Python 3 features listed below can be __future__
imported and used in 2.6 and 2.7, which were released to coincide with 3.0 and 3.1 respectively. Having said this, upgrading is, of course, still advised as new features are ‘frozen’ in past releases and will not benefit from the evolution and maintenance of current versions.
Onto what you’re missing out on in Python 3…
Print is a Function
Yes, we know that most people are aware of this, but it’s one of the most used statements by Pythonistas who are starting out. print
moved from a keyword to a function, which just makes sense.
This Python 2 code
print "Fresh Hacks Every Day" print "Foo", "some more text on the same line as foo"
will become the following in Python 3.
print("Fresh Hacks Every Day") print("Foo", end='') print("some more text on the same line as foo")
Souped up Unpacking
Here we have a tuple containing some data. In Python 2, we can already unpack into different variables like so:
person = ("Steve", "Hammond", 34, "England", "spaces") name, surname, age, country, indent_pref = person
But let’s say we only care about the first name and indentation preference. In Python 3, we can now unpack like this:
person = ("Steve", "Hammond", 34, "England", "spaces") name, *misc, indent_pref = person # These would also work name, surname, age, *misc = person *misc, indent_pref = person
This provides much more flexibility when unpacking — especially handy if dealing with tuples longer than the one in this example.
Unpacking is commonly used in for-loops, especially when using things like zip()
or enumerate()
. As an example of applying the new syntax, we now have a function, get_people_data()
, that returns a list of tuples like the person
example above.
for name, *misc, indent_pref in get_people_data(): if indent_pref is not "spaces": print(f"You will feel the full wrath of PEP 8, {name}.")
This works great. But wouldn’t it be nice if we could store the indentation preference in a better way than just a string? In a way similar to an enum
in C?
Enums
Enums in Python. What a treat. Due to popular demand, they’ve been in the standard library since 3.4.
Now we can define indentation preference like this:
from enum import Enum class Indent(Enum): SPACES = 'spaces' TABS = 'tabs' EITHER = 'either' person = ("Steve", "Hammond", 34, "England", Indent.SPACES) # or person = ("Steve", "Hammond", 34, "England", Indent("spaces")) # Let's try and initialise with an invalid value person = ("Steve", "Hammond", 34, "England", Indent("invalid value")) # 'ValueError: 'invalid value' is not a valid Indent
The syntax seems a bit strange, but it can be useful when dealing with numeric constants or initialisation from strings.
Division
A simple but major change: when dividing integers in Python 3 we get true float division by default (dividing two integers in Python 2 always resulted in an integer result).
This is the Python 2 behaviour:
>>> 1 / 3 0 >>> 5 / 2 2
Whereas in Python 3 we get more accuracy:
>>> 1 / 3 0.3333333333333333 >>> 5 / 2 2.5
//
can of course be used for floor integer division if this is required.
This change is one that should definitely be noted if you’re porting code from 2 to 3; it’s an easy one to slip under the radar that could cause major issues in program logic.
Chaining Exceptions
It’s very common to catch one exception, then raise another. This could be because your application has a defined set of custom exceptions, or simply because you want to provide more information about what went wrong.
Here we have a program which crudely calculates the number of days needed to be worked to earn a certain proportion of yearly pay.
class InvalidSalaryError(Exception): pass def days_to_earn(annual_pay, amount): """Return number of days worked to earn `amount`.""" try: annual_frac = amount / annual_pay except ZeroDivisionError: raise InvalidSalaryError("Could not calculate number of days") return 365 * annual_frac if __name__ == '__main__': print(days_to_earn(0, 4500)) print(days_to_earn(20000, 4500))
We can see that if an annual pay of zero is specified and a ZeroDivisionError
occurs, this is caught, and an InvalidSalaryError
is raised.
Let’s try running this with Python 2.
$ python days_calc.py Traceback (most recent call last): File "exception_chaining.py", line 13, in <module> print(days_to_earn(0, 4500)) File "exception_chaining.py", line 9, in days_to_earn raise InvalidSalaryError("Could not calculate number of days") __main__.InvalidSalaryError: Could not calculate number of days
Because we caught the ZeroDivisionError
, it got swallowed, so only the InvalidSalaryError
traceback is shown.
Now let’s run this with Python 3.
$ python3 days_calc.py Traceback (most recent call last): File "exceptions_chaining.py", line 7, in days_to_earn annual_frac = amount / annual_pay ZeroDivisionError: division by zero During handling of the above exception, another exception occurred: Traceback (most recent call last): File "exceptions_chaining.py", line 13, in <module> print(days_to_earn(0, 4_500)) File "exceptions_chaining.py", line 9, in days_to_earn raise InvalidSalaryError("Could not calculate number of days") __main__.InvalidSalaryError: Could not calculate number of days
We got a far more detailed error report, explaining what caused the InvalidSalaryError
, complete with a traceback for the ZeroDivisionError
. This is particularly useful when you’re using code written by someone else, which might not have the most verbose error messages.
We won’t go into it here, but there’s also new raise ... from
syntax which allows for explicit exception chaining.
Side note: dealing with large numbers in Python
In the above program we had to deal with int
s in the tens of thousands. Writing long numbers like 20000
or 0b001110100010
is a recipe for screen squinting. How about separating them like this: 20_000
, 0b_0011_1010_0010
? This is another way Python 3 will make your life easier – optional underscores in numeric literals, which can help to make things more readable.
Unicode and Bytes
In Python 2 the concept of bytes and strings were pretty much interchangeable, and both came under the str
type. This led to some very nasty conversion consequences and unpredictable behaviour. The headline to remember is that in Python 3 all strings are unicode. A distinction was very deliberately created between text and bytes, offering a much more defined way of working.
In the below examples, we test the type of a bytestring, ‘normal’ string and unicode string in the interpreters of Python 2 and 3.
Python 2.7:
>>> type(b'foo') <type 'str'> # In Python 2, a bytestring is a normal string! >>> type('foo') <type 'str'> # The same bytes type as above >>> type(u'foo') <type 'unicode'>
Python 3.7:
>>> type(b'foo') <class 'bytes'> # In Python 3, this has its own bytes type >>> type('foo') <class 'str'> # In Python 3, 'str' means unicode >>> type(u'foo') <class 'str'> # It's a normal string
This means that dealing with encodings is much clearer in Python 3, even if it comes at the cost of slipping in a few more .encode()
s. Chances are that when your program communicates with anything in the outside world, such as files or sockets, you’ll have to encode and decode your data.
As an example, if you’re using pyserial
to read/write to a serial device, you’ll need to explicitly encode and decode your messages.
import serial PORT = '/dev/ttyACM0' BAUD = 115200 ENCODING = 'utf-8' if __name__ == '__main__': ser = serial.Serial(port=PORT, baudrate=BAUD) ser.write('hello'.encode(ENCODING)) response_bytes = ser.readline() response_str = response_bytes.decode(ENCODING)
String Formatting
Formatting strings in Python 2 was performed in a similar style to C:
author = "Guido van Rossum" date = 1989 foo = "%s started Python in %d" % (author, date) # Guido van Rossum started Python in 1989
The docs explicitly state that this way of formatting “exhibits a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly)”. It’s also inflexible, and starts to become ugly to look at when dealing with long strings.
We now have two new ways of formatting strings. One is ultra-convenient, and one is ultra-powerful.
Formatted String Literals
Often referred to as ‘f-strings’, these provide an incredibly easy way to reference any variable available in the current scope.
Checkout how intuitive and readable this code is:
author = "Guido van Rossum" date = 1989 foo = f"{author} started Python in {date}" # Guido van Rossum started Python in 1989
Anything inside the curly braces will be evaluated as an expression, so you can put small logic or statements inside the string if you so desire. It’s probably not very readable to contain program logic within a string, but if it’s just getting logged to provide extra debug information then it’s really handy.
author = "Guido van Rossum" date = 1989 foo = f"{author.split()[0]} started Python {2000 - date} years before the turn of the century" # Guido started Python 11 years before the turn of the century
The .format Method
In Python 3, every string has a .format
method, which provides a dazzling array of options for formatting, covering the overwhelming majority of use cases. We won’t go into detail here as it is backported to 2.6 and 2.7, so won’t be a big upgrade pull for most Python 2 users.
Here’s one of our favourite guides for reference.
Imports
Let’s say that you’re using Python 2, and have the following file hierarchy:
pkg ├──service.py └──misc.py run.py utils.py
run.py
simply imports and calls something from the service
module in the pkg
package.
But service.py
relies on functions contained in utils.py
. So at the top of service.py
we have this statement.
import utils
Seems fairly unassuming, and everything will work just fine. But what if our folder structure now changes, with pkg
acquiring a new utils
module?
pkg ├──service.py ├──misc.py └──utils.py run.py utils.py
Time for confusion: our code now switches and uses the utils.py
file within pkg
. Things would get even more messy if we happened to have a library installed named utils
. This approach isn’t defined enough, and consistently led to unpredictable behaviour in Python 2.
Python 3 to the rescue: it’s no longer supported syntax if it’s ambiguous whether the import is supposed to be absolute or relative. The top of service.py
could become any of the following options depending on what’s required.
# To import from within pkg, either use relative from . import utils # or absolute from pkg import utils # To import from the same level as run.py, use absolute import utils
This feature might seem like a bit of a nicety in this example, but when dealing with big codebases containing large package/import hierarchies you’ll be glad of it.
Other
There are many, many other features which offer improvements and entirely new features over Python 2, even if some are only useful in niche areas. Here are just a few:
- The addition of the
asyncio
library makes writing asynchronous programs a breeze. A must in modern programming. - Other major standard library additions (like
concurrent.futures
,ipaddress
, andpathlib
) - Accessing a parent class in Python 2 is needlessly heavy on syntax. In Python 3,
super()
becomes even more magical. - Many builtins such as
zip()
,map()
, andfilter()
now return iterators instead of lists. In many cases, this will save significant amounts of memory without you even knowing.
Tools
If you decide that porting code from Python 2 to 3 is the way to go, there are some existing tools to make your life easier.
- 2to3 – the official automated code translation tool. Yes, it will port the code for you! Not guaranteed to catch everything, but does a lot of the tedious syntactic fiddling.
- caniusepython3 – a nifty utility for detecting which of your project dependencies are stopping you from making the leap.
- tox automates and streamlines the testing of Python code. It allows you to easily test your code on multiple versions of Python, which is fantastic for any project in general, but particularly comes in handy when you’re testing the success of your newly ported codebase on different versions.
Conclusion
There are countless reasons to upgrade to Python 3 – not just because of the convenience of new features, but because random, obscure bugs in the language get fixed regularly by Python developers, in Python 3 alone. Only the most robustly tested, never-need-to-change codebases have an excuse to remain at Python 2. Everything else should be enjoying the fantastic hard work put in by Python developers to make the language what it is today.
No comments:
Post a Comment