Jump to content.

Python reference sheet (work in progress)

What standard library functions to use to do some basic tasks in Python, and how to invoke them. The reference version being Python 3.5 for our purposes; 2.7 differs in certain respects, but if you’re still using that then you should be migrating already.

Filesystem access

Main modules: io, os (including os.path), shutil, glob. The io.open function is available from the builtins namespace. The os.open function is low-level and not what you’re after (equivalent to C open, while io.open is vaguely equivalent to C fopen).

New folder, copy, move, rename, delete

Create a folder:

os.mkdir("hello") # Must not exist, containing folder must exist.
os.makedirs("hello") # Must not exist, containing folder may or may not exist.
os.makedirs("hello", exist_ok=True) # May exist, containing folder may or may not exist.

Copy a file or folder:

shutil.copy("bob.txt", "cat.txt") # File
shutil.copytree("bob", "cat") # Folder

Move a file or folder, either within or between filesystems:

shutil.move("hello.txt", os.path.join("cat", "hello"))

Rename a file or folder, or move it within a single filesystem:

os.rename("bob", "cat")
os.rename("hello.txt", os.path.join("cat", "hello.txt"))

Delete a file or empty folder:

os.unlink("hello.txt") # Or os.remove (does exactly the same thing)
os.rmdir("bob") # Must be empty
shutil.rmtree("bob") # May or may not be empty

Delete an empty folder along with any otherwise-empty containing folders. This is less likely to be the one you’re after and is listed mostly for completeness:

os.removedirs("bob")

Creating, reading and writing files

Create a file containing text (will create an initially empty file to write to and replace a file if it exists):

f = open("hello.txt", "w", encoding="utf-8")
f.write("¡Hello, world!")
f.close()

Append text to an existing file:

f = open("hello.txt", "a", encoding="utf-8")
f.write("\n¡Bonjour, tout le monde!") # Converts \n based on platform
f.close()

Read the text contents of a file to a string:

f = open("hello.txt", "r", encoding="utf-8")
b = f.read() # Converts \r\n and \r to \n
f.close()

Read the text contents of a non-UTF8 file to a string:

# Western European (Windows)
f = open("hello.txt", "r", encoding="windows-1252")
b = f.read()
f.close()

# Japanese (Shift-JIS) in Windows variant
f = open("nihongo.txt", "r", encoding="ms-kanji")
b = f.read()
f.close()

Read the contents of a non-text file:

f = open("hello.db", "rb")
b = f.read() # Gives a bytes object, does not convert newlines.
f.close()

Write to a non-text file:

f = open("hello.db", "wb")
f.write(b"\x00\x04\n\r\r\n\n\x1AHello\xFAWorld!!!")
f.close()

Handling folders and probing files

Get current (and parent) folder path:

cwd = os.getcwd() # Current
parent = os.path.split(os.getcwd())[0]

# Alternatives using abspath:
cwd = os.path.abspath(os.curdir) # or os.path.curdir (usually ".")
parent = os.path.abspath(os.pardir) # usually ".."

Change directory:

os.chdir("bob")

Check if a file or folder exists, and whether somethings a file or folder:

if os.path.exists("bob.txt"):
    # do stuff
if os.path.isdir("bob.txt"):
    # do stuff (exists and is a folder)
if os.path.isfile("bob.txt"):
    # do stuff (exists and is a file)

List files and folders within a folder:

foo = os.listdir("bananas")
bar = os.listdir("bob")

Step through all files and folders under a directory:

for root, dirs, files in os.walk("bob"):
    for filename in files:
        filepath = os.path.join(root, filename)
        # ...
    for foldername in dirs:
        dirpath = os.path.join(root, foldername)
        # ...

Step through all files and folders matching a filename wildcard pattern:

for path in glob.glob("b?bcats/*.txt"):
    # ...

Step through all files and folders matching a recursive filename wildcard pattern (i.e. matching subdirectories recursively):

for path in glob.glob("b?bcats/**/*.txt", recursive=True):
    # ...

Read file size in bytes:

size = os.stat("hello.txt").st_size

Read file timestamps (in UNIX time):

atime = os.stat("hello.txt").st_atime # Access timestamp
mtime = os.stat("hello.txt").st_mtime # Modification timestamp
# INODE change timestamp on Linux, creation timestamp on Windows:
ctime = os.stat("hello.txt").st_ctime

Write file timestamps (in UNIX time):

os.utime("hello.txt", (atime, mtime))
os.utime("hello.txt") # "Touches" file (set timestamp to current time).

Hashing

CRC32 and Adler32. The zlib module is available if Python is built with zlib support (unless you have some stripped down embedded build, it almost certainly is). The binascii module ought to be be present in either case:

# signed/unsigned depends on version/platform:
crc = zlib.crc32(dat) # dat should be a bytes object
crc = binascii.crc32(dat)
# get same (unsigned) value everywhere:
crc = zlib.crc32(dat) & 0xFFFFFFFF
crc = binascii.crc32(dat) & 0xFFFFFFFF
# Adler32 (only present in zlib):
crc = zlib.adler32(dat) & 0xFFFFFFFF

MD5, SHA1 and SHA2 (the hashlib module). This may link against a SSL/TLS library to provide broader support, but built-in support for MD5, SHA1, SHA2 should always be present (with the further addition of SHA3 and BLAKE2 in Python 3.6):

# dat should be a bytes object, returns a str
md5raw = hashlib.md5(dat).digest() # byte form
md5 = hashlib.md5(dat).hexdigest() # hexadecimal form
sha = hashlib.sha1(dat).hexdigest()
sha2_256 = hashlib.sha256(dat).hexdigest()
# Specify any hash supported by the underlying SSL/TLS library:
sha2_256 = hashlib.new("sha256", dat).hexdigest()
# Find out which hash formats are available on your system:
print(hashlib.algorithms_available)

Binary-to-text encodings

Base64 (base64 module):

base64 = base64.b64encode(dat)
dat = base64.b64decode(base64)
base64_urlsafe = base64.urlsafe_b64encode(dat)

Hexadecimal (binascii module, accessible through codecs):

# Gives lowercase, accepts either case:
hexadecimal = codecs.encode(dat, "hex")
dat = codecs.decode(hexadecimal, "hex")
# Using binascii directly, same behaviour:
hexadecimal = binascii.hexlify(dat)
dat = binascii.unhexlify(hexadecimal)

Hexadecimal can also be handled through the base64 module, though I’m not sure why you’d want to do this:

hexadecimal = base64.b16encode(dat) # Gives uppercase
dat = base64.b16decode(hexadecimal) # Accepts uppercase only
dat = base64.b16decode(hexadecimal, True) # Accepts either

Quoted-Printable (quopri module, accessible through the codecs module):

# Handles bytes objects, names notwithstanding:
quop = quopri.encodestring(dat)
dat = quopri.decodestring(quop)
# Using the codecs module:
quop = codecs.encode(dat, "quopri")
dat = codecs.decode(quop, "quopri")

Unix UUencode and the related Classic MacOS HQX (uu, binhex, codecs modules):

# Encoding and decoding files:
uu.encode(infile, uu_outfile)
uu.decode(uu_infile, outfile)
binhex.binhex(infilename, hqx_outfile)
binhex.hexbin(hqx_infile, outfilename)
# Encoding and decoding strings:
uudat = codecs.encode(dat, "uu")
dat = codecs.decode(uudat, "uu")

Other binary-to-text encodings in the base64 module:

base32_rfc4648 = base64.b32encode(dat)
ascii85 = base64.a85encode(dat)
base85 = base64.b85encode(dat)