Confirmed users
513
edits
Klahnakoski (talk | contribs) |
Klahnakoski (talk | contribs) |
||
Line 47: | Line 47: | ||
=== Python Issues === | === Python Issues === | ||
* Python is slow. Python speed comes from the C libraries it uses, spending time in the Python interpreter is a bad idea. For example, going through the characters in all strings to check for invalid Unicode turned a slow program into an unusable one. The solution was to find a builtin library that did the work for me (or would raise an exception if the conditions were false). This ETL program has significant data structure transformations that can only be done in Python. The solution was | * Python is slow. Python speed comes from the C libraries it uses, spending time in the Python interpreter is a bad idea. For example, going through the characters in all strings to check for invalid Unicode turned a slow program into an unusable one. The solution was to find a builtin library that did the work for me (or would raise an exception if the conditions were false). This ETL program has significant data structure transformations that can only be done in Python. The solution was to use the PyPy interpreter. | ||
* PyPy does not work well with C libraries. The C libraries had to be removed in favor of pure Python versions of the same. This was not too hard, except when it came to JSON libraries | * PyPy does not work well with C libraries. The C libraries had to be removed in favor of pure Python versions of the same. This was not too hard, except when it came to JSON libraries | ||
* JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code. Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!) This made the PyPy version appear inferior despite the speed up in the ETL portion of the code. Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds. The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass it to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py]. | * JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code. Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!) This made the PyPy version appear inferior despite the speed up in the ETL portion of the code. Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds. The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass it to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py]. |