Python reference sheet (work in progress)
What standard library functions to use to do some basic tasks in Python, and how to invoke them. The reference version being Python 3.5 for our purposes; 2.7 differs in certain respects, but if you’re still using that then you should be migrating already.
Filesystem access
Main modules: io
, os
(including os.path
), shutil
, glob
. The io.open
function is available from the builtins namespace. The os.open
function is low-level and not what you’re after (equivalent to C open
, while io.open
is vaguely equivalent to C fopen
).
New folder, copy, move, rename, delete
Create a folder:
os.mkdir("hello") # Must not exist, containing folder must exist. os.makedirs("hello") # Must not exist, containing folder may or may not exist. os.makedirs("hello", exist_ok=True) # May exist, containing folder may or may not exist.
Copy a file or folder:
shutil.copy("bob.txt", "cat.txt") # File shutil.copytree("bob", "cat") # Folder
Move a file or folder, either within or between filesystems:
shutil.move("hello.txt", os.path.join("cat", "hello"))
Rename a file or folder, or move it within a single filesystem:
os.rename("bob", "cat") os.rename("hello.txt", os.path.join("cat", "hello.txt"))
Delete a file or empty folder:
os.unlink("hello.txt") # Or os.remove (does exactly the same thing) os.rmdir("bob") # Must be empty shutil.rmtree("bob") # May or may not be empty
Delete an empty folder along with any otherwise-empty containing folders. This is less likely to be the one you’re after and is listed mostly for completeness:
os.removedirs("bob")
Creating, reading and writing files
Create a file containing text (will create an initially empty file to write to and replace a file if it exists):
f = open("hello.txt", "w", encoding="utf-8") f.write("¡Hello, world!") f.close()
Append text to an existing file:
f = open("hello.txt", "a", encoding="utf-8") f.write("\n¡Bonjour, tout le monde!") # Converts \n based on platform f.close()
Read the text contents of a file to a string:
f = open("hello.txt", "r", encoding="utf-8") b = f.read() # Converts \r\n and \r to \n f.close()
Read the text contents of a non-UTF8 file to a string:
# Western European (Windows) f = open("hello.txt", "r", encoding="windows-1252") b = f.read() f.close() # Japanese (Shift-JIS) in Windows variant f = open("nihongo.txt", "r", encoding="ms-kanji") b = f.read() f.close()
Read the contents of a non-text file:
f = open("hello.db", "rb") b = f.read() # Gives a bytes object, does not convert newlines. f.close()
Write to a non-text file:
f = open("hello.db", "wb") f.write(b"\x00\x04\n\r\r\n\n\x1AHello\xFAWorld!!!") f.close()
Processing file paths
Split directory and name:
# Gives "../flan" for dirn and "boats.txt" for fn. # The boats.txt may represent a file or directory, it doesn't care. # Using tuple syntax on the left of the assignment unpacks the returned tuple. dirn, fn = os.path.split("../flan/boats.txt") # If you only need one of them dirn = os.path.dirname("../flan/boats.txt") fn = os.path.basename("../flan/boats.txt")
Split name and extension:
# Gives "../flan/boats" for basic and ".txt" for ext. basic, ext = os.path.splitext("../flan/boats.txt")
Handling folders and probing files
Get current (and parent) folder path:
cwd = os.getcwd() # Current parent = os.path.dirname(os.getcwd()) # Alternatives using abspath: cwd = os.path.abspath(os.curdir) # or os.path.curdir (usually ".") parent = os.path.abspath(os.pardir) # usually ".."
Change directory:
os.chdir("bob")
Check if a file or folder exists, and whether somethings a file or folder:
if os.path.exists("bob.txt"): # do stuff if os.path.isdir("bob.txt"): # do stuff (exists and is a folder) if os.path.isfile("bob.txt"): # do stuff (exists and is a file)
List files and folders within a folder:
foo = os.listdir("bananas") bar = os.listdir("bob")
Step through all files and folders under a directory:
for root, dirs, files in os.walk("bob"): for filename in files: filepath = os.path.join(root, filename) # ... for foldername in dirs: dirpath = os.path.join(root, foldername) # ...
Step through all files and folders matching a filename wildcard pattern:
for path in glob.glob("b?bcats/*.txt"): # ...
Step through all files and folders matching a recursive filename wildcard pattern (i.e. matching subdirectories recursively):
for path in glob.glob("b?bcats/**/*.txt", recursive=True): # ...
Read file size in bytes:
size = os.stat("hello.txt").st_size
Read file timestamps (in UNIX time):
atime = os.stat("hello.txt").st_atime # Access timestamp mtime = os.stat("hello.txt").st_mtime # Modification timestamp # INODE change timestamp on Linux, creation timestamp on Windows: ctime = os.stat("hello.txt").st_ctime
Write file timestamps (in UNIX time):
os.utime("hello.txt", (atime, mtime)) os.utime("hello.txt") # "Touches" file (set timestamp to current time).
Hashing
CRC32 and Adler32. The zlib
module is available if Python is built with zlib
support (unless you have some stripped down embedded build, it almost certainly is). The binascii
module ought to be be present in either case:
# signed/unsigned depends on version/platform: crc = zlib.crc32(dat) # dat should be a bytes object crc = binascii.crc32(dat) # get same (unsigned) value everywhere: crc = zlib.crc32(dat) & 0xFFFFFFFF crc = binascii.crc32(dat) & 0xFFFFFFFF # Adler32 (only present in zlib): crc = zlib.adler32(dat) & 0xFFFFFFFF
MD5, SHA1 and SHA2 (the hashlib
module). This may link against a SSL/TLS library to provide broader support, but built-in support for MD5, SHA1, SHA2 should always be present (with the further addition of SHA3 and BLAKE2 in Python 3.6):
# dat should be a bytes object, returns a str md5raw = hashlib.md5(dat).digest() # byte form md5 = hashlib.md5(dat).hexdigest() # hexadecimal form sha = hashlib.sha1(dat).hexdigest() sha2_256 = hashlib.sha256(dat).hexdigest() # Specify any hash supported by the underlying SSL/TLS library: sha2_256 = hashlib.new("sha256", dat).hexdigest() # Find out which hash formats are available on your system: print(hashlib.algorithms_available)
Binary-to-text encodings
Base64 (base64
module):
base64 = base64.b64encode(dat) dat = base64.b64decode(base64) base64_urlsafe = base64.urlsafe_b64encode(dat)
Hexadecimal (binascii
module, accessible through codecs
):
# Gives lowercase, accepts either case: hexadecimal = codecs.encode(dat, "hex") dat = codecs.decode(hexadecimal, "hex") # Using binascii directly, same behaviour: hexadecimal = binascii.hexlify(dat) dat = binascii.unhexlify(hexadecimal)
Hexadecimal can also be handled through the base64
module, though I’m not sure why you’d want to do this:
hexadecimal = base64.b16encode(dat) # Gives uppercase dat = base64.b16decode(hexadecimal) # Accepts uppercase only dat = base64.b16decode(hexadecimal, True) # Accepts either
Quoted-Printable (quopri
module, accessible through the codecs
module):
# Handles bytes objects, names notwithstanding: quop = quopri.encodestring(dat) dat = quopri.decodestring(quop) # Using the codecs module: quop = codecs.encode(dat, "quopri") dat = codecs.decode(quop, "quopri")
Unix UUencode and the related Classic MacOS HQX (uu
, binhex
, codecs
modules):
# Encoding and decoding files: uu.encode(infile, uu_outfile) uu.decode(uu_infile, outfile) binhex.binhex(infilename, hqx_outfile) binhex.hexbin(hqx_infile, outfilename) # Encoding and decoding strings: uudat = codecs.encode(dat, "uu") dat = codecs.decode(uudat, "uu")
Other binary-to-text encodings in the base64
module:
base32_rfc4648 = base64.b32encode(dat) ascii85 = base64.a85encode(dat) base85 = base64.b85encode(dat)
Data serialisation
Python string representation (repr
), the same as is used in Python source (i.e. analogous to Lisp “printable” objects). Types’ __repr__
methods should return either an eval
-able representation, or else something in <…>
(like the default). The pprint
module exists to provide pretty-printed repr
representations.
Obviously, actual eval
executes arbitrary code as long as it’s an expression (which if readability doesn’t matter, can indeed do anything a statement can) so this is not even theoretically secure. The ast
module provides a safe but limited alternative:
s = repr(obj) s = pprint.pformat(obj) # Pretty-printed version. obj = ast.literal_eval(s) # Should be secure; only certain built-in types. obj = eval(s) # Not secure, since it executes an arbitrary expression.
The binary representation used in .pyc
files is not actually supposed to be portable between even slightly different Python versions, and hence is not really supposed to be used by apps, but its API is listed here for completeness. It works only for built-in types (though instances of custom subclasses of built-in classes are treated as being of the parent type by dumps
) and uses the marshal
module:
byterepr = marshal.dumps(obj) obj = marshal.loads(byterepr)
Pickling with the pickle
module; this is the main serialisation method for trusted data (note, trusted data) only needing handling in Python. There are a few versions of the format; version 0
is the original and ASCII-based, whereas subsequent versions are not ASCII-based and the default has been 3
since Python 3.0.
Interfaces can be registered for pickling instances of custom classes, but it is important to note that this system means pickle
isn’t actually any more secure than eval
, since a custom class can use an arbitrary constructor (including eval
, os.system
, etc) with arbitrary arguments:
byterepr = pickle.dumps(obj) # 0 meaning the old (backward compatible, ASCII based) format byterepr = pickle.dumps(obj, 0) obj = pickle.loads(byterepr) # Not secure.
JSON with the json
module, should be secure, less powerful than repr
but readable from other languages:
s = json.dumps(obj) s = json.dumps(obj, indent=4, sort_keys=True) # Pretty-printed version. obj = json.loads(s) # Should be secure.
It’s worth noting that JSON is close to a subset of Python 3.x, the difference you’re most likely to come across being the use of true
/false
/null
rather than True
/False
/None
. This can be easily fixed (obviously this is not at all secure, but is listed mainly so it doesn’t catch you off-guard):
true, false, null = True, False, None obj = eval(jsondata) # Not secure; also, seriously, don't.