Auto-tools/Projects/PublicES: Difference between revisions

Jump to navigation Jump to search
Line 43: Line 43:
* Python and Javascript property access is different enough to cause a multitude of bugs when just performing naive conversion:  For example, converting Javascript <tt>if (!a.b){ ... }</tt> to Python <tt>if not a["b"]:  ....</tt> can emit key exceptions and simply take the wrong path when dealing with empty sets.
* Python and Javascript property access is different enough to cause a multitude of bugs when just performing naive conversion:  For example, converting Javascript <tt>if (!a.b){ ... }</tt> to Python <tt>if not a["b"]:  ....</tt> can emit key exceptions and simply take the wrong path when dealing with empty sets.
* Python is slow.  Python speed comes from the C libraries it uses, spending time in the Python interpreter is a bad idea.  For example, going through the characters in all strings to check for invalid Unicode turned a slow program into an unusable one.  The solution was to find a builtin library that did the work for me (or would raise an exception if the conditions were false).  This ETL program has significant data structure transformations that can only be done in Python.  The solution was to move to use the PyPy interpreter.
* Python is slow.  Python speed comes from the C libraries it uses, spending time in the Python interpreter is a bad idea.  For example, going through the characters in all strings to check for invalid Unicode turned a slow program into an unusable one.  The solution was to find a builtin library that did the work for me (or would raise an exception if the conditions were false).  This ETL program has significant data structure transformations that can only be done in Python.  The solution was to move to use the PyPy interpreter.
* Alias analysis is error prone:  Email address used by users can be changed, and there is no record of those changes.  The bug activity table has recorded changes for emails that "apparently" do not exist.  Well, they do exist, but are aliased.  The old ETL used reviews to do some matching.  The new version uses the CC lists which have more information.  The problem is fundamental corruption in the history caused by (possible) direct poking of the database.  This corruption must be mitigated with fuzzy logic.
* It took a while to build up a library of tests that could be used to verify future changes.  More tests => more test code => more bugs in test code => more bugs found in production code => more tests.  Sometimes it seemed endless.
* It took a while to build up a library of tests that could be used to verify future changes.  More tests => more test code => more bugs in test code => more bugs found in production code => more tests.  Sometimes it seemed endless.
* PyPy does not work well with C libraries.  The C libraries had to be removed in favor of pure Python versions of the same.  This was not too hard, except when it came to JSON libraries
* PyPy does not work well with C libraries.  The C libraries had to be removed in favor of pure Python versions of the same.  This was not too hard, except when it came to JSON libraries
Line 50: Line 51:
* Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read.  Python's threading library is still immature: It has no high level threading constructs to deal with common use cases in an environment that raises exceptions.
* Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read.  Python's threading library is still immature: It has no high level threading constructs to deal with common use cases in an environment that raises exceptions.
* Python2.7 has no exception chaining - added it
* Python2.7 has no exception chaining - added it
In the end we have a high speed ETL solution that is easy to install and execute.  There are plenty of improvements that can be made, and definitely in the area of more threads and more multiple processes.  But those can wait while we
Confirmed users
513

edits

Navigation menu