.. -=- Python reference sheet -=-

= Python reference sheet (work in progress) =

What standard library functions to use to do some basic tasks in Python, and how to invoke them.&ensp;The reference version being Python 3.5 for our purposes; 2.7 differs in certain respects, but if you're still using that then you should be migrating already.

== Filesystem access ==

Main modules: `io`, `os` (including `os.path`), `shutil`, `glob`.&ensp;The `io.open` function is available from the builtins namespace.&ensp;The `os.open` function is low-level and not what you're after (equivalent to C `open`, while `io.open` is vaguely equivalent to C `fopen`).

=== New folder, copy, move, rename, delete ===

Create a folder::

    os.mkdir("hello")  # Must not exist, containing folder must exist.
    os.makedirs("hello")  # Must not exist, containing folder may or may not exist.
    os.makedirs("hello", exist_ok=True)  # May exist, containing folder may or may not exist.

Copy a file or folder::

    shutil.copy("bob.txt", "cat.txt") # File
    shutil.copytree("bob", "cat") # Folder

Move a file or folder, either within or between filesystems:

    shutil.move("hello.txt", os.path.join("cat", "hello"))

Rename a file or folder, or move it within a single filesystem::

    os.rename("bob", "cat")
    os.rename("hello.txt", os.path.join("cat", "hello.txt"))

Delete a file or empty folder::

    os.unlink("hello.txt")  # Or os.remove (does exactly the same thing)
    os.rmdir("bob")  # Must be empty
    shutil.rmtree("bob")  # May or may not be empty

Delete an empty folder along with any otherwise-empty containing folders.&ensp;This is less likely
to be the one you're after and is listed mostly for completeness::

    os.removedirs("bob")

=== Creating, reading and writing files ===

Create a file containing text (will create an initially empty file to write to and replace a file if it exists)::

    f = open("hello.txt", "w", encoding="utf-8")
    f.write("¡Hello, world!")
    f.close()

Append text to an existing file::

    f = open("hello.txt", "a", encoding="utf-8")
    f.write("\n¡Bonjour, tout le monde!") # Converts \n based on platform
    f.close()

Read the text contents of a file to a string::

    f = open("hello.txt", "r", encoding="utf-8")
    b = f.read() # Converts \r\n and \r to \n
    f.close()

Read the text contents of a non-UTF8 file to a string::

    # Western European (Windows)
    f = open("hello.txt", "r", encoding="windows-1252")
    b = f.read()
    f.close()

    # Japanese (Shift-JIS) in Windows variant
    f = open("nihongo.txt", "r", encoding="ms-kanji")
    b = f.read()
    f.close()

Read the contents of a non-text file::

    f = open("hello.db", "rb")
    b = f.read() # Gives a bytes object, does not convert newlines.
    f.close()

Write to a non-text file::

    f = open("hello.db", "wb")
    f.write(b"\x00\x04\n\r\r\n\n\x1AHello\xFAWorld!!!")
    f.close()

=== Processing file paths ===

Split directory and name::

    # Gives "../flan" for dirn and "boats.txt" for fn.
    # The boats.txt may represent a file or directory, it doesn't care.
    # Using tuple syntax on the left of the assignment unpacks the returned tuple.
    dirn, fn = os.path.split("../flan/boats.txt")
    # If you only need one of them
    dirn = os.path.dirname("../flan/boats.txt")
    fn = os.path.basename("../flan/boats.txt")

Split name and extension::

    # Gives "../flan/boats" for basic and ".txt" for ext.
    basic, ext = os.path.splitext("../flan/boats.txt")

=== Handling folders and probing files ===

Get current (and parent) folder path::

    cwd = os.getcwd()  # Current
    parent = os.path.dirname(os.getcwd())

    # Alternatives using abspath:
    cwd = os.path.abspath(os.curdir)  # or os.path.curdir (usually ".")
    parent = os.path.abspath(os.pardir)  # usually ".."

Change directory::

    os.chdir("bob")

Check if a file or folder exists, and whether somethings a file or folder::

    if os.path.exists("bob.txt"):
        # do stuff
    if os.path.isdir("bob.txt"):
        # do stuff (exists and is a folder)
    if os.path.isfile("bob.txt"):
        # do stuff (exists and is a file)

List files and folders within a folder::

    foo = os.listdir("bananas")
    bar = os.listdir("bob")

Step through all files and folders under a directory::

    for root, dirs, files in os.walk("bob"):
        for filename in files:
            filepath = os.path.join(root, filename)
            # ...
        for foldername in dirs:
            dirpath = os.path.join(root, foldername)
            # ...

Step through all files and folders matching a filename wildcard pattern::

    for path in glob.glob("b?bcats/*.txt"):
        # ...

Step through all files and folders matching a recursive filename wildcard pattern (i.e. matching subdirectories recursively)::

    for path in glob.glob("b?bcats/**/*.txt", recursive=True):
        # ...

Read file size in bytes::

    size = os.stat("hello.txt").st_size

Read file timestamps (in UNIX time)::

    atime = os.stat("hello.txt").st_atime  # Access timestamp
    mtime = os.stat("hello.txt").st_mtime  # Modification timestamp
    # INODE change timestamp on Linux, creation timestamp on Windows:
    ctime = os.stat("hello.txt").st_ctime

Write file timestamps (in UNIX time)::

    os.utime("hello.txt", (atime, mtime))
    os.utime("hello.txt")  # "Touches" file (set timestamp to current time).

== Hashing ==

CRC32 and Adler32. The `zlib` module is available if Python is built with `zlib` support (unless
you have some stripped down embedded build, it almost certainly is).&ensp;The `binascii` module
ought to be be present in either case::

    # signed/unsigned depends on version/platform:
    crc = zlib.crc32(dat)  # dat should be a bytes object
    crc = binascii.crc32(dat)
    # get same (unsigned) value everywhere:
    crc = zlib.crc32(dat) & 0xFFFFFFFF
    crc = binascii.crc32(dat) & 0xFFFFFFFF
    # Adler32 (only present in zlib):
    crc = zlib.adler32(dat) & 0xFFFFFFFF

MD5, SHA1 and SHA2 (the `hashlib` module).&ensp;This may link against a SSL/TLS library to provide
broader support, but built-in support for MD5, SHA1, SHA2 should always be present (with the
further addition of SHA3 and BLAKE2 in Python 3.6)::

    # dat should be a bytes object, returns a str
    md5raw = hashlib.md5(dat).digest()  # byte form
    md5 = hashlib.md5(dat).hexdigest()  # hexadecimal form
    sha = hashlib.sha1(dat).hexdigest()
    sha2_256 = hashlib.sha256(dat).hexdigest()
    # Specify any hash supported by the underlying SSL/TLS library:
    sha2_256 = hashlib.new("sha256", dat).hexdigest()
    # Find out which hash formats are available on your system:
    print(hashlib.algorithms_available)

== Binary-to-text encodings ==

Base64 (`base64` module)::

    base64 = base64.b64encode(dat)
    dat = base64.b64decode(base64)
    base64_urlsafe = base64.urlsafe_b64encode(dat)

Hexadecimal (`binascii` module, accessible through `codecs`)::

    # Gives lowercase, accepts either case:
    hexadecimal = codecs.encode(dat, "hex")
    dat = codecs.decode(hexadecimal, "hex")
    # Using binascii directly, same behaviour:
    hexadecimal = binascii.hexlify(dat)
    dat = binascii.unhexlify(hexadecimal)

Hexadecimal can also be handled through the `base64` module,
though I'm not sure why you'd want to do this::

    hexadecimal = base64.b16encode(dat)  # Gives uppercase
    dat = base64.b16decode(hexadecimal)  # Accepts uppercase only
    dat = base64.b16decode(hexadecimal, True)  # Accepts either

Quoted-Printable (`quopri` module, accessible through the `codecs` module)::

    # Handles bytes objects, names notwithstanding:
    quop = quopri.encodestring(dat)
    dat = quopri.decodestring(quop)
    # Using the codecs module:
    quop = codecs.encode(dat, "quopri")
    dat = codecs.decode(quop, "quopri")

Unix UUencode and the related Classic MacOS HQX (`uu`, `binhex`, `codecs` modules)::

    # Encoding and decoding files:
    uu.encode(infile, uu_outfile)
    uu.decode(uu_infile, outfile)
    binhex.binhex(infilename, hqx_outfile)
    binhex.hexbin(hqx_infile, outfilename)
    # Encoding and decoding strings:
    uudat = codecs.encode(dat, "uu")
    dat = codecs.decode(uudat, "uu")

Other binary-to-text encodings in the `base64` module::

    base32_rfc4648 = base64.b32encode(dat)
    ascii85 = base64.a85encode(dat)
    base85 = base64.b85encode(dat)

== Data serialisation ==

Python string representation (`repr`), the same as is used in Python source (i.e. analogous to Lisp
"printable" objects).&ensp;Types' `__repr__` methods should return either an `eval`-able 
representation, or else something in `<…>` (like the default).&ensp;The `pprint` module exists to
provide pretty-printed `repr` representations.

Obviously, actual `eval` executes arbitrary code as long as it's an expression (which if 
readability doesn't matter, can indeed do anything a statement can) so this is not even 
theoretically secure.&ensp;The `ast` module provides a safe but limited alternative::

    s = repr(obj)
    s = pprint.pformat(obj)  # Pretty-printed version.
    obj = ast.literal_eval(s)  # Should be secure; only certain built-in types.
    obj = eval(s)  # Not secure, since it executes an arbitrary expression.

The binary representation used in `.pyc` files is not actually supposed to be portable between even
slightly different Python versions, and hence is not really supposed to be used by apps, but its 
API is listed here for completeness.&ensp;It works only for built-in types (though instances of 
custom subclasses of built-in classes are treated as being of the parent type by `dumps`) and uses
the `marshal` module::

    byterepr = marshal.dumps(obj)
    obj = marshal.loads(byterepr)

Pickling with the `pickle` module; this is the main serialisation method for trusted data (note,
*trusted* data) only needing handling in Python.&ensp;There are a few versions of the format; 
version `0` is the original and ASCII-based, whereas subsequent versions are not ASCII-based and 
the default has been `3` since Python 3.0.

Interfaces can be registered for pickling instances of custom classes, but it is important to note 
that this system means `pickle` isn't actually any more secure than `eval`, since a custom class 
can use an arbitrary constructor (including `eval`, `os.system`, etc) with arbitrary arguments::

    byterepr = pickle.dumps(obj)
    # 0 meaning the old (backward compatible, ASCII based) format
    byterepr = pickle.dumps(obj, 0)
    obj = pickle.loads(byterepr)  # Not secure.

JSON with the `json` module, should be secure, less powerful than `repr` but readable from other
languages::

    s = json.dumps(obj)
    s = json.dumps(obj, indent=4, sort_keys=True)  # Pretty-printed version.
    obj = json.loads(s)  # Should be secure.

It's worth noting that JSON is close to a subset of Python 3.x, the difference you're most likely 
to come across being the use of `true`/`false`/`null` rather than `True`/`False`/`None`.&ensp;This 
can be easily fixed (obviously this is not at all secure, but is listed mainly so it doesn't 
catch you off-guard)::

    true, false, null = True, False, None
    obj = eval(jsondata)  # Not secure; also, seriously, don't.

..