Python reference sheet (work in progress)

What standard library functions to use to do some basic tasks in Python, and how to invoke them. The reference version being Python 3.5 for our purposes; 2.7 differs in certain respects, but if you’re still using that then you should be migrating already.

Filesystem access

Main modules: io, os (including os.path), shutil, glob. The io.open function is available from the builtins namespace. The os.open function is low-level and not what you’re after (equivalent to C open, while io.open is vaguely equivalent to C fopen).

New folder, copy, move, rename, delete

Create a folder:

os.mkdir("hello")  # Must not exist, containing folder must exist.
os.makedirs("hello")  # Must not exist, containing folder may or may not exist.
os.makedirs("hello", exist_ok=True)  # May exist, containing folder may or may not exist.

Copy a file or folder:

shutil.copy("bob.txt", "cat.txt") # File
shutil.copytree("bob", "cat") # Folder

Move a file or folder, either within or between filesystems:

shutil.move("hello.txt", os.path.join("cat", "hello"))

Rename a file or folder, or move it within a single filesystem:

os.rename("bob", "cat")
os.rename("hello.txt", os.path.join("cat", "hello.txt"))

Delete a file or empty folder:

os.unlink("hello.txt")  # Or os.remove (does exactly the same thing)
os.rmdir("bob")  # Must be empty
shutil.rmtree("bob")  # May or may not be empty

Delete an empty folder along with any otherwise-empty containing folders. This is less likely to be the one you’re after and is listed mostly for completeness:

os.removedirs("bob")

Creating, reading and writing files

Create a file containing text (will create an initially empty file to write to and replace a file if it exists):

f = open("hello.txt", "w", encoding="utf-8")
f.write("¡Hello, world!")
f.close()

Append text to an existing file:

f = open("hello.txt", "a", encoding="utf-8")
f.write("\n¡Bonjour, tout le monde!") # Converts \n based on platform
f.close()

Read the text contents of a file to a string:

f = open("hello.txt", "r", encoding="utf-8")
b = f.read() # Converts \r\n and \r to \n
f.close()

Read the text contents of a non-UTF8 file to a string:

# Western European (Windows)
f = open("hello.txt", "r", encoding="windows-1252")
b = f.read()
f.close()

# Japanese (Shift-JIS) in Windows variant
f = open("nihongo.txt", "r", encoding="ms-kanji")
b = f.read()
f.close()

Read the contents of a non-text file:

f = open("hello.db", "rb")
b = f.read() # Gives a bytes object, does not convert newlines.
f.close()

Write to a non-text file:

f = open("hello.db", "wb")
f.write(b"\x00\x04\n\r\r\n\n\x1AHello\xFAWorld!!!")
f.close()

Processing file paths

Split directory and name:

# Gives "../flan" for dirn and "boats.txt" for fn.
# The boats.txt may represent a file or directory, it doesn't care.
# Using tuple syntax on the left of the assignment unpacks the returned tuple.
dirn, fn = os.path.split("../flan/boats.txt")
# If you only need one of them
dirn = os.path.dirname("../flan/boats.txt")
fn = os.path.basename("../flan/boats.txt")

Split name and extension:

# Gives "../flan/boats" for basic and ".txt" for ext.
basic, ext = os.path.splitext("../flan/boats.txt")

Handling folders and probing files

Get current (and parent) folder path:

cwd = os.getcwd()  # Current
parent = os.path.dirname(os.getcwd())

# Alternatives using abspath:
cwd = os.path.abspath(os.curdir)  # or os.path.curdir (usually ".")
parent = os.path.abspath(os.pardir)  # usually ".."

Change directory:

os.chdir("bob")

Check if a file or folder exists, and whether somethings a file or folder:

if os.path.exists("bob.txt"):
    # do stuff
if os.path.isdir("bob.txt"):
    # do stuff (exists and is a folder)
if os.path.isfile("bob.txt"):
    # do stuff (exists and is a file)

List files and folders within a folder:

foo = os.listdir("bananas")
bar = os.listdir("bob")

Step through all files and folders under a directory:

for root, dirs, files in os.walk("bob"):
    for filename in files:
        filepath = os.path.join(root, filename)
        # ...
    for foldername in dirs:
        dirpath = os.path.join(root, foldername)
        # ...

Step through all files and folders matching a filename wildcard pattern:

for path in glob.glob("b?bcats/*.txt"):
    # ...

Step through all files and folders matching a recursive filename wildcard pattern (i.e. matching subdirectories recursively):

for path in glob.glob("b?bcats/**/*.txt", recursive=True):
    # ...

Read file size in bytes:

size = os.stat("hello.txt").st_size

Read file timestamps (in UNIX time):

atime = os.stat("hello.txt").st_atime  # Access timestamp
mtime = os.stat("hello.txt").st_mtime  # Modification timestamp
# INODE change timestamp on Linux, creation timestamp on Windows:
ctime = os.stat("hello.txt").st_ctime

Write file timestamps (in UNIX time):

os.utime("hello.txt", (atime, mtime))
os.utime("hello.txt")  # "Touches" file (set timestamp to current time).

Hashing

CRC32 and Adler32. The zlib module is available if Python is built with zlib support (unless you have some stripped down embedded build, it almost certainly is). The binascii module ought to be be present in either case:

# signed/unsigned depends on version/platform:
crc = zlib.crc32(dat)  # dat should be a bytes object
crc = binascii.crc32(dat)
# get same (unsigned) value everywhere:
crc = zlib.crc32(dat) & 0xFFFFFFFF
crc = binascii.crc32(dat) & 0xFFFFFFFF
# Adler32 (only present in zlib):
crc = zlib.adler32(dat) & 0xFFFFFFFF

MD5, SHA1 and SHA2 (the hashlib module). This may link against a SSL/TLS library to provide broader support, but built-in support for MD5, SHA1, SHA2 should always be present (with the further addition of SHA3 and BLAKE2 in Python 3.6):

# dat should be a bytes object, returns a str
md5raw = hashlib.md5(dat).digest()  # byte form
md5 = hashlib.md5(dat).hexdigest()  # hexadecimal form
sha = hashlib.sha1(dat).hexdigest()
sha2_256 = hashlib.sha256(dat).hexdigest()
# Specify any hash supported by the underlying SSL/TLS library:
sha2_256 = hashlib.new("sha256", dat).hexdigest()
# Find out which hash formats are available on your system:
print(hashlib.algorithms_available)

Binary-to-text encodings

Base64 (base64 module):

base64 = base64.b64encode(dat)
dat = base64.b64decode(base64)
base64_urlsafe = base64.urlsafe_b64encode(dat)

Hexadecimal (binascii module, accessible through codecs):

# Gives lowercase, accepts either case:
hexadecimal = codecs.encode(dat, "hex")
dat = codecs.decode(hexadecimal, "hex")
# Using binascii directly, same behaviour:
hexadecimal = binascii.hexlify(dat)
dat = binascii.unhexlify(hexadecimal)

Hexadecimal can also be handled through the base64 module, though I’m not sure why you’d want to do this:

hexadecimal = base64.b16encode(dat)  # Gives uppercase
dat = base64.b16decode(hexadecimal)  # Accepts uppercase only
dat = base64.b16decode(hexadecimal, True)  # Accepts either

Quoted-Printable (quopri module, accessible through the codecs module):

# Handles bytes objects, names notwithstanding:
quop = quopri.encodestring(dat)
dat = quopri.decodestring(quop)
# Using the codecs module:
quop = codecs.encode(dat, "quopri")
dat = codecs.decode(quop, "quopri")

Unix UUencode and the related Classic MacOS HQX (uu, binhex, codecs modules):

# Encoding and decoding files:
uu.encode(infile, uu_outfile)
uu.decode(uu_infile, outfile)
binhex.binhex(infilename, hqx_outfile)
binhex.hexbin(hqx_infile, outfilename)
# Encoding and decoding strings:
uudat = codecs.encode(dat, "uu")
dat = codecs.decode(uudat, "uu")

Other binary-to-text encodings in the base64 module:

base32_rfc4648 = base64.b32encode(dat)
ascii85 = base64.a85encode(dat)
base85 = base64.b85encode(dat)

Data serialisation

Python string representation (repr), the same as is used in Python source (i.e. analogous to Lisp “printable” objects). Types’ __repr__ methods should return either an eval-able representation, or else something in <…> (like the default). The pprint module exists to provide pretty-printed repr representations.

Obviously, actual eval executes arbitrary code as long as it’s an expression (which if readability doesn’t matter, can indeed do anything a statement can) so this is not even theoretically secure. The ast module provides a safe but limited alternative:

s = repr(obj)
s = pprint.pformat(obj)  # Pretty-printed version.
obj = ast.literal_eval(s)  # Should be secure; only certain built-in types.
obj = eval(s)  # Not secure, since it executes an arbitrary expression.

The binary representation used in .pyc files is not actually supposed to be portable between even slightly different Python versions, and hence is not really supposed to be used by apps, but its API is listed here for completeness. It works only for built-in types (though instances of custom subclasses of built-in classes are treated as being of the parent type by dumps) and uses the marshal module:

byterepr = marshal.dumps(obj)
obj = marshal.loads(byterepr)

Pickling with the pickle module; this is the main serialisation method for trusted data (note, trusted data) only needing handling in Python. There are a few versions of the format; version 0 is the original and ASCII-based, whereas subsequent versions are not ASCII-based and the default has been 3 since Python 3.0.

Interfaces can be registered for pickling instances of custom classes, but it is important to note that this system means pickle isn’t actually any more secure than eval, since a custom class can use an arbitrary constructor (including eval, os.system, etc) with arbitrary arguments:

byterepr = pickle.dumps(obj)
# 0 meaning the old (backward compatible, ASCII based) format
byterepr = pickle.dumps(obj, 0)
obj = pickle.loads(byterepr)  # Not secure.

JSON with the json module, should be secure, less powerful than repr but readable from other languages:

s = json.dumps(obj)
s = json.dumps(obj, indent=4, sort_keys=True)  # Pretty-printed version.
obj = json.loads(s)  # Should be secure.

It’s worth noting that JSON is close to a subset of Python 3.x, the difference you’re most likely to come across being the use of true/false/null rather than True/False/None. This can be easily fixed (obviously this is not at all secure, but is listed mainly so it doesn’t catch you off-guard):

true, false, null = True, False, None
obj = eval(jsondata)  # Not secure; also, seriously, don't.