.. -=- Python reference sheet -=- = Python reference sheet (work in progress) = What standard library functions to use to do some basic tasks in Python, and how to invoke them. The reference version being Python 3.5 for our purposes; 2.7 differs in certain respects, but if you're still using that then you should be migrating already. == Filesystem access == Main modules: `io`, `os` (including `os.path`), `shutil`, `glob`. The `io.open` function is available from the builtins namespace. The `os.open` function is low-level and not what you're after (equivalent to C `open`, while `io.open` is vaguely equivalent to C `fopen`). === New folder, copy, move, rename, delete === Create a folder:: os.mkdir("hello") # Must not exist, containing folder must exist. os.makedirs("hello") # Must not exist, containing folder may or may not exist. os.makedirs("hello", exist_ok=True) # May exist, containing folder may or may not exist. Copy a file or folder:: shutil.copy("bob.txt", "cat.txt") # File shutil.copytree("bob", "cat") # Folder Move a file or folder, either within or between filesystems: shutil.move("hello.txt", os.path.join("cat", "hello")) Rename a file or folder, or move it within a single filesystem:: os.rename("bob", "cat") os.rename("hello.txt", os.path.join("cat", "hello.txt")) Delete a file or empty folder:: os.unlink("hello.txt") # Or os.remove (does exactly the same thing) os.rmdir("bob") # Must be empty shutil.rmtree("bob") # May or may not be empty Delete an empty folder along with any otherwise-empty containing folders. This is less likely to be the one you're after and is listed mostly for completeness:: os.removedirs("bob") === Creating, reading and writing files === Create a file containing text (will create an initially empty file to write to and replace a file if it exists):: f = open("hello.txt", "w", encoding="utf-8") f.write("¡Hello, world!") f.close() Append text to an existing file:: f = open("hello.txt", "a", encoding="utf-8") f.write("\n¡Bonjour, tout le monde!") # Converts \n based on platform f.close() Read the text contents of a file to a string:: f = open("hello.txt", "r", encoding="utf-8") b = f.read() # Converts \r\n and \r to \n f.close() Read the text contents of a non-UTF8 file to a string:: # Western European (Windows) f = open("hello.txt", "r", encoding="windows-1252") b = f.read() f.close() # Japanese (Shift-JIS) in Windows variant f = open("nihongo.txt", "r", encoding="ms-kanji") b = f.read() f.close() Read the contents of a non-text file:: f = open("hello.db", "rb") b = f.read() # Gives a bytes object, does not convert newlines. f.close() Write to a non-text file:: f = open("hello.db", "wb") f.write(b"\x00\x04\n\r\r\n\n\x1AHello\xFAWorld!!!") f.close() === Processing file paths === Split directory and name:: # Gives "../flan" for dirn and "boats.txt" for fn. # The boats.txt may represent a file or directory, it doesn't care. # Using tuple syntax on the left of the assignment unpacks the returned tuple. dirn, fn = os.path.split("../flan/boats.txt") # If you only need one of them dirn = os.path.dirname("../flan/boats.txt") fn = os.path.basename("../flan/boats.txt") Split name and extension:: # Gives "../flan/boats" for basic and ".txt" for ext. basic, ext = os.path.splitext("../flan/boats.txt") === Handling folders and probing files === Get current (and parent) folder path:: cwd = os.getcwd() # Current parent = os.path.dirname(os.getcwd()) # Alternatives using abspath: cwd = os.path.abspath(os.curdir) # or os.path.curdir (usually ".") parent = os.path.abspath(os.pardir) # usually ".." Change directory:: os.chdir("bob") Check if a file or folder exists, and whether somethings a file or folder:: if os.path.exists("bob.txt"): # do stuff if os.path.isdir("bob.txt"): # do stuff (exists and is a folder) if os.path.isfile("bob.txt"): # do stuff (exists and is a file) List files and folders within a folder:: foo = os.listdir("bananas") bar = os.listdir("bob") Step through all files and folders under a directory:: for root, dirs, files in os.walk("bob"): for filename in files: filepath = os.path.join(root, filename) # ... for foldername in dirs: dirpath = os.path.join(root, foldername) # ... Step through all files and folders matching a filename wildcard pattern:: for path in glob.glob("b?bcats/*.txt"): # ... Step through all files and folders matching a recursive filename wildcard pattern (i.e. matching subdirectories recursively):: for path in glob.glob("b?bcats/**/*.txt", recursive=True): # ... Read file size in bytes:: size = os.stat("hello.txt").st_size Read file timestamps (in UNIX time):: atime = os.stat("hello.txt").st_atime # Access timestamp mtime = os.stat("hello.txt").st_mtime # Modification timestamp # INODE change timestamp on Linux, creation timestamp on Windows: ctime = os.stat("hello.txt").st_ctime Write file timestamps (in UNIX time):: os.utime("hello.txt", (atime, mtime)) os.utime("hello.txt") # "Touches" file (set timestamp to current time). == Hashing == CRC32 and Adler32. The `zlib` module is available if Python is built with `zlib` support (unless you have some stripped down embedded build, it almost certainly is). The `binascii` module ought to be be present in either case:: # signed/unsigned depends on version/platform: crc = zlib.crc32(dat) # dat should be a bytes object crc = binascii.crc32(dat) # get same (unsigned) value everywhere: crc = zlib.crc32(dat) & 0xFFFFFFFF crc = binascii.crc32(dat) & 0xFFFFFFFF # Adler32 (only present in zlib): crc = zlib.adler32(dat) & 0xFFFFFFFF MD5, SHA1 and SHA2 (the `hashlib` module). This may link against a SSL/TLS library to provide broader support, but built-in support for MD5, SHA1, SHA2 should always be present (with the further addition of SHA3 and BLAKE2 in Python 3.6):: # dat should be a bytes object, returns a str md5raw = hashlib.md5(dat).digest() # byte form md5 = hashlib.md5(dat).hexdigest() # hexadecimal form sha = hashlib.sha1(dat).hexdigest() sha2_256 = hashlib.sha256(dat).hexdigest() # Specify any hash supported by the underlying SSL/TLS library: sha2_256 = hashlib.new("sha256", dat).hexdigest() # Find out which hash formats are available on your system: print(hashlib.algorithms_available) == Binary-to-text encodings == Base64 (`base64` module):: base64 = base64.b64encode(dat) dat = base64.b64decode(base64) base64_urlsafe = base64.urlsafe_b64encode(dat) Hexadecimal (`binascii` module, accessible through `codecs`):: # Gives lowercase, accepts either case: hexadecimal = codecs.encode(dat, "hex") dat = codecs.decode(hexadecimal, "hex") # Using binascii directly, same behaviour: hexadecimal = binascii.hexlify(dat) dat = binascii.unhexlify(hexadecimal) Hexadecimal can also be handled through the `base64` module, though I'm not sure why you'd want to do this:: hexadecimal = base64.b16encode(dat) # Gives uppercase dat = base64.b16decode(hexadecimal) # Accepts uppercase only dat = base64.b16decode(hexadecimal, True) # Accepts either Quoted-Printable (`quopri` module, accessible through the `codecs` module):: # Handles bytes objects, names notwithstanding: quop = quopri.encodestring(dat) dat = quopri.decodestring(quop) # Using the codecs module: quop = codecs.encode(dat, "quopri") dat = codecs.decode(quop, "quopri") Unix UUencode and the related Classic MacOS HQX (`uu`, `binhex`, `codecs` modules):: # Encoding and decoding files: uu.encode(infile, uu_outfile) uu.decode(uu_infile, outfile) binhex.binhex(infilename, hqx_outfile) binhex.hexbin(hqx_infile, outfilename) # Encoding and decoding strings: uudat = codecs.encode(dat, "uu") dat = codecs.decode(uudat, "uu") Other binary-to-text encodings in the `base64` module:: base32_rfc4648 = base64.b32encode(dat) ascii85 = base64.a85encode(dat) base85 = base64.b85encode(dat) == Data serialisation == Python string representation (`repr`), the same as is used in Python source (i.e. analogous to Lisp "printable" objects). Types' `__repr__` methods should return either an `eval`-able representation, or else something in `<…>` (like the default). The `pprint` module exists to provide pretty-printed `repr` representations. Obviously, actual `eval` executes arbitrary code as long as it's an expression (which if readability doesn't matter, can indeed do anything a statement can) so this is not even theoretically secure. The `ast` module provides a safe but limited alternative:: s = repr(obj) s = pprint.pformat(obj) # Pretty-printed version. obj = ast.literal_eval(s) # Should be secure; only certain built-in types. obj = eval(s) # Not secure, since it executes an arbitrary expression. The binary representation used in `.pyc` files is not actually supposed to be portable between even slightly different Python versions, and hence is not really supposed to be used by apps, but its API is listed here for completeness. It works only for built-in types (though instances of custom subclasses of built-in classes are treated as being of the parent type by `dumps`) and uses the `marshal` module:: byterepr = marshal.dumps(obj) obj = marshal.loads(byterepr) Pickling with the `pickle` module; this is the main serialisation method for trusted data (note, *trusted* data) only needing handling in Python. There are a few versions of the format; version `0` is the original and ASCII-based, whereas subsequent versions are not ASCII-based and the default has been `3` since Python 3.0. Interfaces can be registered for pickling instances of custom classes, but it is important to note that this system means `pickle` isn't actually any more secure than `eval`, since a custom class can use an arbitrary constructor (including `eval`, `os.system`, etc) with arbitrary arguments:: byterepr = pickle.dumps(obj) # 0 meaning the old (backward compatible, ASCII based) format byterepr = pickle.dumps(obj, 0) obj = pickle.loads(byterepr) # Not secure. JSON with the `json` module, should be secure, less powerful than `repr` but readable from other languages:: s = json.dumps(obj) s = json.dumps(obj, indent=4, sort_keys=True) # Pretty-printed version. obj = json.loads(s) # Should be secure. It's worth noting that JSON is close to a subset of Python 3.x, the difference you're most likely to come across being the use of `true`/`false`/`null` rather than `True`/`False`/`None`. This can be easily fixed (obviously this is not at all secure, but is listed mainly so it doesn't catch you off-guard):: true, false, null = True, False, None obj = eval(jsondata) # Not secure; also, seriously, don't. ..