.. -=- Python Gotchas -=NO=-=MENU=- -=- ''Since this is slightly embarassing, I've removed it from the site menu. It remains accessible by the URL if you happen to have it bookmarked or linked for some reason.'' = Python for those raised on other languages = The aim of this document is to address the things which most often confuse or frustrate people trying to pick up Python from a background of other programming languages. It will not be helpful to anyone without such a background, and does not itself give enough information to pick up the language. This by necessity has to go beyond a mere list of features that are or are not present. A lot of the frustrations people express about it seem to stem from coming from a certain way of thinking or working (such as that of C or Java) and expecting what they have learned or been taught or know to be good practice (or indeed to work) to be applicable in Python. Often, this isn't the case. Python has its own philosophy, its own *feel*, its own way of thinking about its execution environment and its syntax which I aim to expound, with its consequences, here. == Object model == Each file (technically, even the main script, which becomes a module called `__main__`) represents a module, which is an object, an instance of the type named `module` (but available as `types.ModuleType`). Classes may be defined within the module in order to define new types, which become attributes of the module. The attribute namespace of the module object is the same as the global namespace of code executed within the module. This "module" concept, possible since Python instances can permit assignment of members not defined on their classes, renders the Java-style "bolt the whole world and his dog onto a class" regimentation unnecessary. If it will never logically be instatiated more than once, there usually isn't any point in creating a class. Everything you interact with is an object. This includes modules, functions, numbers, strings, lists ''et cetera''. There are no exceptions to this: even primatives such as integers function as immutable objects. This is highly polymorphic, objects (including functions) are not segregated by type in any context, and you can define your own types using class definitions (technically, since Python 2.2). Since Python 3.0, all classes have been types and, since Python 2.2, types have been classes. Between Python 2.2 and Python 2.7 inclusive, only classes that explicitly inherited from the `object` type were types, for historic reasons. But Python 2.7 is going obsolete in 2020 anyway (and any earlier is long obsolete), so you shouldn't be using it for new stuff. === Assignments and references === Every name is a reference. This is in contrast to C, where every name is something in either a register or memory. Assignments, therefore, do not copy objects, but add references to them elsewhere. This detail makes no difference for immutables (which can only be changed by being replaced with new objects, affecting only one reference), but means that if you change a mutable somewhere, it affects all references to it. See also the remarks about `+=` below. Objects generally exist, therefore, independantly to their references. However, once an object has a zero reference count, it is deleted. Insular cyclic references are periodically broken by more advanced garbage collector routines. A name can be assigned to any accessible object, including `None`, or deleted altogether (with the `del` keyword). Nothing about the namespace is set in stone: everything can be changed or removed if so needed (a few are read-only). See below for what these namespaces actually are under the surface. Some types (such as `dict`) implement a `copy` method to create copies. Sequences can be copied using slice (subsequence) syntax without specificed limits, i.e. `i[:]` (called an "empty slice", not to be confused with a zero-length slice). === Properties and methods === Explicit getters and setters are generally discouraged. In fact, there exists a decorator (`@property`) specifically for wrapping a getter and setter so that they behave like a directly accessed attribute from the outside. Double underscores before and after a name is used for names treated specially by the interpreter, e.g. for implementing operators. This is so they can freely add more without worrying about breaking existing code, as your own choice of names shouldn't and wouldn't use this scheme. This has nothing to do with private members. A namespace is simply a dictionary (hashtable) with the names as keys and references to the objects as values. The namespace of an object is itself accesible via attribute syntax as `__dict__`. The namespace of the module you are in can be accessed by calling the `globals()` built-in function. You can now probably guess what the `locals()` function does (although avoid modifying its return value in a context with closures as that's understandably undefined). If called in the module root, it returns the same as `globals()`. One big gotcha: using mutable objects as attributes, you'd want to create them at instatiation time rather than defining them on the class (other than to `None`): otherwise they get created only once (upon executing the class definition) and assigned to the attribute on every instance i.e. get shared as one object across all instances. Within a class, double underscores only before (`__hello`) is treated as a shorthand for `_ClassName__hello`, where ClassName is the class name. I don't know why: this is rather inconsistent with the usual philosophy and behaviour of Python so it's a tad odd. Other than that, nothing is private. As what I mean by that, names don't come out of nowhere. A name resolves first to the local namespace (factoring in any closures), then the global namespace (the module's attribute namespace), then the builtins namespace (the `builtins` (since 3.0; `__builtin__` beforehand) module's attribute namespace). Hence there are no magical `this` or `arguments` names, the objects in question have to be accepted / named in the parameter list (e.g. `(self, *args, **kwargs)` or `(s, *a, **kw)`). How this works is that methods without another decorator are automatically decorated as instance methods, i.e. by the metaclass. Upon creation of an instance, bound instance methods (on the instance) are generated from the existing (unbound) instance method objects on the class. These bound methods keep a reference to the instance and will pass it as the first argument to the wrapped function upon being invoked. Using `self` is the expected style but is not enforced by the language; I will sometimes use `s` (`s.` is not much longer than, say, `$`). === Indexing === Accessing via attribute syntax (`i.hello`) is not the same as item access (`i["hello"]` or `i[s]`), which calls the `__getitem__` method. To access an attribute named by a string, use e.g. `getattr(i, s)`. As well as dictionaries, lists and tuples use item syntax for indexing, which they start from zero. === Type enforcement === Because typical namespaces are just completely mutable dictionaries under the hood, and the names merely handles to access objects which exist independantly of them (until garbage collected), there is no type-checking inherent in assigning a name to an object, and there are no type-segregated namespaces (not even for functions). In general, a type can substitute for any other type, merely by implementing the methods required by the interface, without any further ritual. This is a high level of polymorphism inherent in the language, without any need for templates, generics or deriving from interfaces (although abstract base classes have more recently crept in). However, type coercion is not / no longer used (the last remenants were removed in Python 3.0 to much rejoicing), so any type errors will trace back fairly easily (we *do not talk* about Unicode in Python 2.7 and earlier, *understand?*). For example, `1 + "1"` is a TypeError. == Error messages and exceptions == You do not have to catch exceptions. Uncaught exceptions will print nice tracebacks usually helping identify you where you went wrong, and Python will exit with an error condition. If you catch the exception and just print it, you don't get that. The `traceback` module is your friend here. Or, if abruptly exiting with a message upon error is fine (many simple command-line scripts), don't catch it, and leave it to Python's default handling. == Syntax == == Block syntax == This is where Python is syntactically least similar to C. If you want C, you know where to find it: Python isn't designed to be resiliant to most of your whitespace somehow blowing away in a whirlwind or something. Semicolons are supported to a limited extent for allowing multiple simple statements on a single line, they are not required nor even advocated at the end of the line. Yes, that's right, the statement seperator is literally the line break. Sticking a semicolon at the end is valid syntax but decidedly weird, syntactically equivalent to leaving a blank line after every statement. Sticking a colon at the end of every line is syntactically incorrect: final colons are used only to open block statements. If you want to wrap a single statement, put a backslash in front of the line break. This is not necessary if the break falls partway through a parenthetical. A block statement begins with an opening line, ending in a colon, followed by a block of increased indentation. It can be indented by any combination of spaces and tabs as long as it is consistent. Four spaces is the convention, though this is only standardised in terms of [being the PSF's house style](https://www.python.org/dev/peps/pep-0008/). Python previously treated a tab as equivalent to eight spaces, this was stopped in version 3.0 and tabs are no longer interchangable with any number of spaces. If only one statement is within the block, it can be put after the colon on the same line as the opener. This is not usually an encouraged style, however. === Statements versus expressions === All expressions are statements, but only a very limited subset (unusually limited, in fact) of statements are expressions. An *expression* is something which it is valid syntax to put as the condition of a `while` statement, at the right hand side of an assignment (more on the particulars of those later), as the body of a `lambda` expression, or to pass as a string to the `eval` function. The `exec` function (or `exec` statement, in Python 1.x and 2.x: it is a function in both Python 0.9.x and Python 3.x) will execute any statement but does not return a value, unlike `eval`. Inline operators (`+`, `-` et al) are permitted in expressions, as are function calls and `lambda` expressions, but no assignments of any kind, no block statements, and no `import`, `assert`, `del` ''et cetera'' (or, in older versions, `print` or `exec`) statements. This means that you cannot use assignments in the conditional of a `while` loop. '''Note:''' an assignment operator permitted in expressions (`:=`) is [slated](https://www.python.org/dev/peps/pep-0572/) for Python 3.8. === Loops === There are two loops, `for` and `while`. There is no `do` (there is a `try` ;P). Whereas `while` works more or less how it always does, `for` takes a sequence or iterator, and will iterate over it. In particular, `for` cannot easily be used to write a forever loop (you'd have to design a forever iterator first, which would be completely silly). The most succinct syntax for a forever loop such as an event loop is `while 1:`. In a `for` loop's conditional, to iterate over a range of numbers, use `range`. To iterate over elements of a sequence *and their indices*, use `enumerate`. == Operators == Although the `+` operator is used for both addition and concatenation, ambiguous usage is an error condition. The `*` operator is used for both multiplication and repetition. The `%` is used for both remainder and old-style string formatting (the new style uses the `.format()` method). `/` has been float division since Python 3.0. Prior to which, it was usually type-dependant (between integer quotient or float division) like in C, though this could be changed using a command-line switch. The `//` is the integer quotient operator for when that's what you want (the `#` is the comment sign). Whereas `&`, `|` and `!=` have the same meanings as they have in C, the typical C forms of the non-bitwise boolean operations (`&&`, `||` or `!`) are not used. These operators are provided in `` syntax (which are also native in standard non‑Microsoft C++), i.e. `and`, `or` and `not`. The `^` operator is bitwise xor, as it is in C, and it does NOT raise anything to a power (unlike Maple/Excel/etc). The power operator is `**` (which is *also* supported by Maple). === In-place and incrementation operators === Firstly, there is no `i++`. You don't need it as often in Python anyway; also, compiler optimisations are irrelevant, and `i+=1` is only one character longer. Any more complex use of either `i++` or `++i` would be against [the codified Python syntactical design philosophy](https://www.python.org/dev/peps/pep-0020/). If you want C (or indeed Java), you know where to find it. Meanwhile, may I introduce you to `range()` and `enumerate()`? Back in Python 1.5, there was no `+=` either. There now is, but bear in mind that the following do NOT do the same thing (because lists are mutable and override the `+=` operator): ``` i = [] j = i i += ["hello"] ``` ``` i = [] j = i i = i + ["hello"] ``` In the first, the object (pointed to by `i` and `j`) is modified; in the second, a new object is created and assigned to `i`, while `j` will still point to the original. If these are done on an immutable object such as an integer, or indeed any object which doesn't override the default `__iadd__` method, they will do the same thing though. The same consideration applies to mutable sets and the `|=` operator.