Alex Tomkins - Djangohttps://www.alextomkins.com/2018-07-21T13:14:00+01:00Avoiding cached datetimes with Django querysets2018-07-21T13:14:00+01:002018-07-21T13:14:00+01:00Alex Tomkinstag:www.alextomkins.com,2018-07-21:/2018/07/avoiding-cached-datetimes-with-django-querysets/<p>In Django we often need to filter out objects from a queryset which shouldn't be visible to public
users, a typical example of this would be a news post in a blog. A staff user could edit a news
post to have a publish date in the future, allowing it …</p><p>In Django we often need to filter out objects from a queryset which shouldn't be visible to public
users, a typical example of this would be a news post in a blog. A staff user could edit a news
post to have a publish date in the future, allowing it to be automatically published by the site
without having to log back in and publish it.</p>
<p>A simple model for such a news post could look like:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="k">class</span> <span class="nc">Post</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
<span class="n">published_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">db_index</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">ordering</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'-published_at'</span><span class="p">,)</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">title</span>
</pre>
<p>In this example, we're using a typical <tt class="docutils literal">ListView</tt>, filtering out any posts which haven't yet been
published:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.utils</span> <span class="kn">import</span> <span class="n">timezone</span>
<span class="kn">from</span> <span class="nn">django.views.generic</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Post</span>
<span class="k">class</span> <span class="nc">PostListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">published_at__lte</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
</pre>
<p>Note - we could use an <tt class="docutils literal">ArchiveIndexView</tt> instead, which by default excludes objects from the
future. However for this example, we're sticking with <tt class="docutils literal">ListView</tt> to show a simplified version of
the problem for other use cases.</p>
<p>When we first load the page, looking through the SQL queries generated for the request, we can see
the posts being filtered by their publish date:</p>
<pre class="code sql literal-block">
<span class="k">SELECT</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span>
<span class="k">FROM</span> <span class="ss">"news_post"</span>
<span class="k">WHERE</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="o"><=</span> <span class="s1">'2018-07-21T11:02:12.998079+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="k">DESC</span>
</pre>
<p>At first this all seems okay, however at some point later on you'll realise that new posts aren't
being shown. Looking at the SQL queries generated for the following requests, we can see that the
timestamp doesn't change between requests:</p>
<pre class="code sql literal-block">
<span class="k">SELECT</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span>
<span class="k">FROM</span> <span class="ss">"news_post"</span>
<span class="k">WHERE</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="o"><=</span> <span class="s1">'2018-07-21T11:02:12.998079+00:00'</span><span class="p">::</span><span class="n">timestamptz</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="k">DESC</span>
</pre>
<p>Why? The queryset gets evaluated when the Django server starts, as the queryset is an attribute of
the generic view.</p>
<p>One solution for this is to move the queryset into the <tt class="docutils literal">get_queryset</tt> method for the generic
view:</p>
<pre class="code python literal-block">
<span class="k">class</span> <span class="nc">PostListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">get_queryset</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">published_at__lte</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
</pre>
<p>By using a method we're creating a new queryset for every request - using the current timestamp
when the request is generated. Problem solved!</p>
<p>However, since Django 1.9 there's a better way - let the database figure out the current time
stamp.</p>
<p>Instead of using <tt class="docutils literal">timezone.now()</tt>, we can switch the view code to the <tt class="docutils literal">Now()</tt> database
function:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">django.db.models.functions</span> <span class="kn">import</span> <span class="n">Now</span>
<span class="kn">from</span> <span class="nn">django.views.generic</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">Post</span>
<span class="k">class</span> <span class="nc">PostListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">Post</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">published_at__lte</span><span class="o">=</span><span class="n">Now</span><span class="p">())</span>
</pre>
<p>Looking at the SQL queries generated for the request, we can see that <tt class="docutils literal">STATEMENT_TIMESTAMP()</tt> is
being used by Postgres to filter out any news posts</p>
<pre class="code sql literal-block">
<span class="k">SELECT</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"id"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"title"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"content"</span><span class="p">,</span>
<span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span>
<span class="k">FROM</span> <span class="ss">"news_post"</span>
<span class="k">WHERE</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="o"><=</span> <span class="p">(</span><span class="n">STATEMENT_TIMESTAMP</span><span class="p">())</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="ss">"news_post"</span><span class="p">.</span><span class="ss">"published_at"</span> <span class="k">DESC</span>
</pre>
<p>The same SQL query will be used for every request, which will now work as the current timestamp
gets evaluated by the database - and we don't need to create a <tt class="docutils literal">get_queryset</tt> method for every
generic view!</p>
Fixing GDAL and GEOS for Django on macOS2017-08-06T15:32:00+01:002017-08-06T15:32:00+01:00Alex Tomkinstag:www.alextomkins.com,2017-08-06:/2017/08/fixing-gdal-geos-django-macos/<p>As a user of <a class="reference external" href="https://www.macports.org/">MacPorts</a> for all the additional packages needed when working with Django on macOS, a
recent upgrade to <a class="reference external" href="https://trac.osgeo.org/geos/">GEOS</a> managed to break all my projects which used GeoDjango:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> ./manage.py <span class="nb">help</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "./manage.py", line 10, in <module></span>
<span class="go"> execute_from_command_line(sys.argv)</span>
<span class="go"> ...</span>
<span class="go"> File …</span></pre></div><p>As a user of <a class="reference external" href="https://www.macports.org/">MacPorts</a> for all the additional packages needed when working with Django on macOS, a
recent upgrade to <a class="reference external" href="https://trac.osgeo.org/geos/">GEOS</a> managed to break all my projects which used GeoDjango:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> ./manage.py <span class="nb">help</span>
<span class="go">Traceback (most recent call last):</span>
<span class="go"> File "./manage.py", line 10, in <module></span>
<span class="go"> execute_from_command_line(sys.argv)</span>
<span class="go"> ...</span>
<span class="go"> File "/Users/tomkins/.virtualenvs/greendale/lib/python3.5/site-packages/django/contrib/gis/geos/libgeos.py", line 147, in geos_version_info</span>
<span class="go"> raise GEOSException('Could not parse version info string "%s"' % ver)</span>
<span class="go">django.contrib.gis.geos.error.GEOSException: Could not parse version info string "3.6.2-CAPI-1.10.2 4d2925d6"</span>
<span class="gp">$</span> port installed geos
<span class="go">The following ports are currently installed:</span>
<span class="go"> geos @3.6.2_0 (active)</span>
</pre></div>
<p>Highly annoying!</p>
<p>This will be fixed in <a class="reference external" href="https://github.com/django/django/pull/8817">PR #8817</a> for Django master, which will be released in Django 2.0 later
this year, and <a class="reference external" href="https://github.com/django/django/pull/8841">PR #8841</a> for the upcoming Django 1.11.5 release. However the fix won't be
backported to older versions of Django, such as the Django 1.8 LTS branch. So to continue using
GeoDjango on macOS, we need to use an older working version of GEOS.</p>
<div class="section" id="kyngchaos-packages">
<h2>KyngChaos packages</h2>
<p>Fortunately <a class="reference external" href="http://www.kyngchaos.com/">KyngChaos</a> has a variety of <a class="reference external" href="http://www.kyngchaos.com/software/frameworks">Unix Compatibility Frameworks</a> available for download,
including GDAL and GEOS. Fortunately it's an older version of GEOS (3.6.1) which will still work
with older versions of Django. Also the older version of GDAL (1.11) works with Django 1.8, as
newer versions of GDAL also cause problems with Django 1.8.</p>
<p>Download and install:</p>
<ul class="simple">
<li>GDAL 1.11 Complete</li>
<li>GDAL 2.1 Complete</li>
</ul>
<p>Although we won't be using GDAL 2.1 in this example, you can easily switch to it if you're only
running Django 1.11.</p>
<p>Add the following to your <tt class="docutils literal">.bash_profile</tt>:</p>
<div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">GDAL_LIBRARY_PATH</span><span class="o">=</span><span class="s2">"/Library/Frameworks/GDAL.framework/Versions/1.11/GDAL"</span>
<span class="nb">export</span> <span class="nv">GEOS_LIBRARY_PATH</span><span class="o">=</span><span class="s2">"/Library/Frameworks/GEOS.framework/Versions/3/GEOS"</span>
</pre></div>
<p>Then add the following to your Django settings file:</p>
<div class="highlight"><pre><span></span><span class="c1"># GeoDjango fixes</span>
<span class="n">GDAL_LIBRARY_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'GDAL_LIBRARY_PATH'</span><span class="p">)</span>
<span class="n">GEOS_LIBRARY_PATH</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'GEOS_LIBRARY_PATH'</span><span class="p">)</span>
</pre></div>
<p>On the server you're deploying these environment variables won't be set, so the setting will
default to <tt class="docutils literal">None</tt> - in which case Django will automatically find the installed versions of GDAL
and GEOS.</p>
<p>Now you should have a fully functioning GeoDjango project on macOS!</p>
</div>
The cost of Dirty Fields2016-12-04T16:13:00+00:002016-12-04T16:13:00+00:00Alex Tomkinstag:www.alextomkins.com,2016-12-04:/2016/12/the-cost-of-dirtyfields/<p>After installing <a class="reference external" href="https://github.com/romgar/django-dirtyfields">Django Dirty Fields</a> on projects a few
months ago and seeing a dramatic reduction in the number of writes to our main
Postgres database - everything seemed fine. However on a brand new project,
something wasn't quite right performance wise:</p>
<pre class="code console literal-block">
<span class="gp">$</span> siege --concurrent<span class="o">=</span><span class="m">1</span> --reps<span class="o">=</span><span class="m">10</span> <span class="s2">"http://127.0.0 …</span></pre><p>After installing <a class="reference external" href="https://github.com/romgar/django-dirtyfields">Django Dirty Fields</a> on projects a few
months ago and seeing a dramatic reduction in the number of writes to our main
Postgres database - everything seemed fine. However on a brand new project,
something wasn't quite right performance wise:</p>
<pre class="code console literal-block">
<span class="gp">$</span> siege --concurrent<span class="o">=</span><span class="m">1</span> --reps<span class="o">=</span><span class="m">10</span> <span class="s2">"http://127.0.0.1:8000/map/?lat=51.4995&lng=0.1248"</span>
<span class="go">...
Transactions: 10 hits
Availability: 100.00 %
Elapsed time: 11.85 secs
Data transferred: 2.27 MB
Response time: 0.88 secs
Transaction rate: 0.84 trans/sec
Throughput: 0.19 MB/sec
Concurrency: 0.75
Successful transactions: 10
Failed transactions: 0
Longest transaction: 0.96
Shortest transaction: 0.83</span>
</pre>
<p>Painfully slow! Although it wasn't the most optimised code possible, an average
of 880ms per request wasn't acceptable.</p>
<div class="section" id="investigating-the-cause">
<h2>Investigating the cause</h2>
<p>As the view for this request generates a JSON response, using
<a class="reference external" href="https://github.com/jazzband/django-debug-toolbar">Django Debug Toolbar</a> wasn't a viable option - as all the debugging output
gets attached to HTML responses only. So instead I decided to run through the
code with <cite>shell_plus</cite> from <a class="reference external" href="https://github.com/django-extensions/django-extensions">Django Extensions</a> and <a class="reference external" href="https://ipython.org/">IPython</a>.</p>
<p>After going through parts of the code, one of the querysets seemed slower than
expected:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">100</span><span class="p">])</span>
<span class="go">10 loops, best of 3: 101 ms per loop</span>
</pre>
<p>Over 100ms to go through a fairly small queryset? This is far too slow. Just to
see if it's a database problem, we'll change it to use <cite>values_list</cite> instead,
which just returns a list of tuples:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="s1">'name'</span><span class="p">,</span> <span class="s1">'location'</span><span class="p">)</span>
<span class="go">10000 loops, best of 3: 80.3 µs per loop</span>
</pre>
<p>And testing another model from another app as a quick sanity check to ensure
there's no problems with other models:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">permission_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Permission</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="go">100 loops, best of 3: 2.29 ms per loop</span>
</pre>
<p>So something is obviously wrong with the queryset/model.</p>
<p>After seeing that this model had <tt class="docutils literal">DirtyFieldsMixin</tt>, which was one obvious
difference between this model and all the others, the next test was to remove it
and see if that made any difference:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">100</span><span class="p">])</span>
<span class="go">100 loops, best of 3: 8.04 ms per loop</span>
</pre>
<p>From 101ms to 8ms.</p>
<p>After removing all instances of <tt class="docutils literal">DirtyFieldsMixin</tt>, another performance test
showed an improvement:</p>
<pre class="code pycon literal-block">
<span class="go">$ siege --concurrent=1 --reps=10 "http://127.0.0.1:8000/map/?lat=51.4995&lng=0.1248"
</span><span class="gp">...</span>
<span class="go">
Transactions: 10 hits
Availability: 100.00 %
Elapsed time: 7.98 secs
Data transferred: 2.27 MB
Response time: 0.40 secs
Transaction rate: 1.25 trans/sec
Throughput: 0.28 MB/sec
Concurrency: 0.50
Successful transactions: 10
Failed transactions: 0
Longest transaction: 0.44
Shortest transaction: 0.36</span>
</pre>
<p>From 880ms to 400ms - a big difference.</p>
</div>
<div class="section" id="testing-django-model-utils">
<h2>Testing django-model-utils</h2>
<p>As Django Dirty Fields wasn't great for performance, an alternative which seems
to offer similar functionality is the tracker field from <a class="reference external" href="https://github.com/carljm/django-model-utils">django-model-utils</a>.</p>
<p>Let's test by adding a <tt class="docutils literal">FieldTracker</tt> field to a model:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">100</span><span class="p">])</span>
<span class="go">10 loops, best of 3: 21.5 ms per loop</span>
</pre>
<p>Much faster than Django Dirty Fields!</p>
<p>Some of the code when saving/updating objects needs updating for
django-model-utils, as it doesn't have the same convenience methods:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="n">venue</span><span class="o">.</span><span class="n">id</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">venue</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">changed_fields</span> <span class="o">=</span> <span class="n">venue</span><span class="o">.</span><span class="n">tracker</span><span class="o">.</span><span class="n">changed</span><span class="p">()</span><span class="o">.</span><span class="n">keys</span><span class="p">()</span>
<span class="k">if</span> <span class="n">changed_fields</span><span class="p">:</span>
<span class="n">venue</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="n">changed_fields</span><span class="p">)</span>
</pre>
<p>One subtle difference between the two packages is that you'll need to ensure
the data you enter is the same type. It's possible to give an <tt class="docutils literal">IntegerField</tt>
a string value, and Django will still save it.</p>
<p>With Django Dirty Fields:</p>
<pre class="code pycon literal-block">
<span class="n"></span><span class="gp">>>> </span><span class="n">venue</span> <span class="o">=</span> <span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="mi">14132</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span>
<span class="go">442478
</span><span class="n"></span><span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span> <span class="o">=</span> <span class="s1">'442478'</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">is_dirty</span><span class="p">()</span>
<span class="go">False
</span><span class="n"></span><span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span>
<span class="go">'442478'
</span><span class="n"></span><span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span> <span class="o">=</span> <span class="s1">'123'</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">is_dirty</span><span class="p">()</span>
<span class="go">True
</span><span class="n"></span><span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span>
<span class="go">'123'
</span><span class="n"></span><span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">refresh_from_db</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span>
<span class="go">123</span>
</pre>
<p>With django-model-utils:</p>
<pre class="code python literal-block">
<span class="o">>>></span> <span class="n">venue</span> <span class="o">=</span> <span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="mi">14132</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span>
<span class="mi">442478</span>
<span class="o">>>></span> <span class="n">venue</span><span class="o">.</span><span class="n">grid_ref_x</span> <span class="o">=</span> <span class="s1">'442478'</span>
<span class="o">>>></span> <span class="n">venue</span><span class="o">.</span><span class="n">tracker</span><span class="o">.</span><span class="n">changed</span><span class="p">()</span>
<span class="p">{</span><span class="s1">'grid_ref_x'</span><span class="p">:</span> <span class="mi">442478</span><span class="p">}</span>
</pre>
<p>Which would result in a lot of additional saves for data, even though the saved
data will end up being the same.</p>
</div>
<div class="section" id="using-proxy-models">
<h2>Using proxy models</h2>
<p>Given that none of the code in the Django views used the dirty fields methods,
and won't use the tracker field either - this seems like an ideal case for
proxy models in Django. Instead of adding the tracker to the <tt class="docutils literal">Venue</tt> model,
we'll create a proxy model instead:</p>
<pre class="code python literal-block">
<span class="k">class</span> <span class="nc">VenueTracker</span><span class="p">(</span><span class="n">Venue</span><span class="p">):</span>
<span class="n">tracker</span> <span class="o">=</span> <span class="n">FieldTracker</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">proxy</span> <span class="o">=</span> <span class="bp">True</span>
</pre>
<p>Now we've got two versions of the same model:</p>
<pre class="code pycon literal-block">
<span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">Venue</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">100</span><span class="p">])</span>
<span class="go">100 loops, best of 3: 8.85 ms per loop
</span><span class="o"></span><span class="gp">>>> </span><span class="o">%</span><span class="n">timeit</span> <span class="n">venue_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">VenueTracker</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()[:</span><span class="mi">100</span><span class="p">])</span>
<span class="go">10 loops, best of 3: 21.2 ms per loop</span>
</pre>
<p>So by default we'll use the standard <tt class="docutils literal">Venue</tt> model in views for speed, however
if we need a version with tracking for our update scripts, we simply change the
imports to point to the model with a tracker:</p>
<pre class="code python literal-block">
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">VenueTracker</span> <span class="k">as</span> <span class="n">Venue</span>
</pre>
<p>Now we can have fast views as normal, but with the option to switch to the
tracked version if needed.</p>
</div>
Modern Django with Ubuntu Trusty2016-10-02T22:57:00+01:002016-10-02T22:57:00+01:00Alex Tomkinstag:www.alextomkins.com,2016-10-02:/2016/10/modern-django-with-ubuntu-trusty/<p>At this point Ubuntu 14.04 Trusty Tahr is nearly 2.5 years old, with another 2.5
years support left until it reaches end of life for support. However for those
of us still working with Trusty, it's often desirable to try and get a few
backported modern packages …</p><p>At this point Ubuntu 14.04 Trusty Tahr is nearly 2.5 years old, with another 2.5
years support left until it reaches end of life for support. However for those
of us still working with Trusty, it's often desirable to try and get a few
backported modern packages to make development and hosting of Django apps a bit
easier to work with.</p>
<p>These days you could use containers to deploy bleeding edge applications with
all the new stuff bundled inside, maybe with either <a class="reference external" href="https://www.docker.com/">Docker</a> or <a class="reference external" href="https://coreos.com/rkt/">rkt</a>. However in
this post we'll be going through a couple of available APT repositories to help
modernise a slightly older distribution.</p>
<div class="section" id="python">
<h2>Python</h2>
<p>As of October 2016, Python 3.5 is the latest stable version of Python. It's
supported in Django 1.8 to 1.10, and for anyone wanting to develop an app for
the long term - going for Python 2.7 at this point in time is a dead end. Ubuntu
Trusty only has Python 2.7 and 3.4, although 3.4 is well supported - we want to
try and go for the latest possible Python version.</p>
<div class="section" id="deadsnakes">
<h3>Deadsnakes</h3>
<p>The <a class="reference external" href="https://launchpad.net/~fkrull/+archive/ubuntu/deadsnakes">Old and New Python Versions</a> repository (or deadsnakes)
provides multiple Python versions which aren't included in a particular version
of Ubuntu. Consider the support of this repository carefully, as it isn't an
official repo (stick with Python 3.4 if this bothers you).</p>
<p>Installation is quick and easy:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> sudo add-apt-repository ppa:fkrull/deadsnakes
<span class="gp">$</span> sudo apt-get update
<span class="gp">$</span> sudo apt-get install python3.5
</pre></div>
<p>As deadsnakes provides even more versions - we could also install Python 3.3 as
well, which could help in testing code under multiple Python versions with <a class="reference external" href="http://tox.testrun.org/">tox</a>.</p>
</div>
</div>
<div class="section" id="postgresql">
<h2>PostgreSQL</h2>
<p>Ubuntu Trusty comes with PostgreSQL 9.3, which is missing JSONB support. So if
we want the JSONField which is new since Django 1.9 - we'll need a more modern
version of Postgres. As of October 2016 the latest release of Postgres is 9.6 -
which is what we want to aim for.</p>
<div class="section" id="postgresql-apt-repository">
<h3>PostgreSQL Apt Repository</h3>
<p>Fortunately postgresql.org provides official packages of all the supported
Postgres versions for quite a few supported distributions. As this is provided
by the official Postgres site, packages are updated with every new release - so
support is good.</p>
<p>To install:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> sudo add-apt-repository <span class="s2">"deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main"</span>
<span class="gp">$</span> curl -sL https://www.postgresql.org/media/keys/ACCC4CF8.asc <span class="p">|</span> sudo apt-key add -
<span class="gp">$</span> sudo apt-get update
<span class="gp">$</span> sudo apt-get install postgresql-9.6
</pre></div>
<p>Now we can use JSONField thanks to the updated Postgres.</p>
</div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>Just because you're stuck on an older release of Ubuntu doesn't mean you're
stuck with all the old tools - go upgrade!</p>
</div>
Speed up Django static files2016-08-28T21:16:00+01:002016-08-28T21:16:00+01:00Alex Tomkinstag:www.alextomkins.com,2016-08-28:/2016/08/speed-up-django-static-files/<p>For a fairly easy performance win, and for something which makes dealing with
old cached CSS something a thing of the past - enable
<a class="reference external" href="https://docs.djangoproject.com/en/1.8/ref/contrib/staticfiles/#manifeststaticfilesstorage">ManifestStaticFilesStorage</a>.</p>
<p>First of all you'll need to edit <em>settings.py</em>:</p>
<div class="highlight"><pre><span></span><span class="c1"># Use manifest static storage</span>
<span class="n">STATICFILES_STORAGE</span> <span class="o">=</span> <span class="s1">'django.contrib.staticfiles.storage.ManifestStaticFilesStorage'</span>
</pre></div>
<p>Then you'll need to edit any templates …</p><p>For a fairly easy performance win, and for something which makes dealing with
old cached CSS something a thing of the past - enable
<a class="reference external" href="https://docs.djangoproject.com/en/1.8/ref/contrib/staticfiles/#manifeststaticfilesstorage">ManifestStaticFilesStorage</a>.</p>
<p>First of all you'll need to edit <em>settings.py</em>:</p>
<div class="highlight"><pre><span></span><span class="c1"># Use manifest static storage</span>
<span class="n">STATICFILES_STORAGE</span> <span class="o">=</span> <span class="s1">'django.contrib.staticfiles.storage.ManifestStaticFilesStorage'</span>
</pre></div>
<p>Then you'll need to edit any templates which aren't using the static template
tag. Instead of using:</p>
<div class="highlight"><pre><span></span><span class="x"><img src="</span><span class="cp">{{</span> <span class="nv">STATIC_URL</span> <span class="cp">}}</span><span class="x">images/hello.jpg" %}" alt="Hello"></span>
</pre></div>
<p>You'll need to use:</p>
<div class="highlight"><pre><span></span><span class="cp">{%</span> <span class="k">load</span> <span class="nv">static</span> <span class="nv">from</span> <span class="nv">staticfiles</span> <span class="cp">%}</span><span class="x"></span>
<span class="x"><img src="</span><span class="cp">{%</span> <span class="k">static</span> <span class="s1">'images/hello.jpg'</span> <span class="cp">%}</span><span class="x">" alt="Hello"></span>
</pre></div>
<p>Now when you run <tt class="docutils literal"><span class="pre">django-admin</span> collectstatic</tt>, Django will include the MD5
hash of the file as part of the file name. Now when you use the <tt class="docutils literal">{% static %}</tt>
tag you'll see the file name with the hash.</p>
<p>If you're running nginx, update your site configuration so that browsers will
cache static files for as long as possible:</p>
<div class="highlight"><pre><span></span><span class="k">location</span> <span class="s">/static/</span> <span class="p">{</span>
<span class="kn">expires</span> <span class="s">max</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Then reload nginx.</p>
<p>Now you should be up and running with a good performance boost, and you won't
have to ask users to refresh a page to get updated static files.</p>
Be careful with Django's .create_or_update()!2016-08-21T18:44:00+01:002016-08-21T18:44:00+01:00Alex Tomkinstag:www.alextomkins.com,2016-08-21:/2016/08/be-careful-with-djangos-create-or-update/<p>Although using <tt class="docutils literal">Model.objects.create_or_update()</tt> with a Django model is
extremely convenient, sometimes you might want to consider using it carefully
with certain usage patterns.</p>
<div class="section" id="excessive-postgres-wal-files">
<h2>Excessive Postgres WAL files</h2>
<p>To allow point-in-time recovery (PITR), I usually setup <a class="reference external" href="http://www.pgbarman.org/">Barman</a>. Set it up on a
remote server, get your Postgres instance to …</p></div><p>Although using <tt class="docutils literal">Model.objects.create_or_update()</tt> with a Django model is
extremely convenient, sometimes you might want to consider using it carefully
with certain usage patterns.</p>
<div class="section" id="excessive-postgres-wal-files">
<h2>Excessive Postgres WAL files</h2>
<p>To allow point-in-time recovery (PITR), I usually setup <a class="reference external" href="http://www.pgbarman.org/">Barman</a>. Set it up on a
remote server, get your Postgres instance to rsync files over to the Barman
server - and you've got a set of backups which should allow you to recover your
database from an earlier point in time.</p>
<p>For every UPDATE/INSERT, Postgres will write data to the WAL (Write-Ahead Log).
This isn't a problem if you're not doing anything else with the finished WAL
files, although you'll have increased disk I/O - it's probably just a minor
increase you won't notice too much.</p>
<p>However if you're keeping the WAL and archiving it for backups, every update to
a table row will be backed up:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> barman list-backup all
<span class="go">golestandt 20160613T063007 - Mon Jun 13 06:36:06 2016 - Size: 4.5 GiB - WAL Size: 12.7 GiB</span>
</pre></div>
<p>Ouch.</p>
</div>
<div class="section" id="finding-the-problem-databases">
<h2>Finding the problem databases</h2>
<p>Assuming you've got statistics enabled, use <tt class="docutils literal">psql</tt> to show the statistics for
each database:</p>
<div class="highlight"><pre><span></span><span class="gp">postgres=#</span> <span class="k">SELECT</span> <span class="n">datname</span><span class="p">,</span><span class="n">tup_updated</span><span class="p">,</span><span class="n">tup_inserted</span><span class="p">,</span><span class="n">tup_deleted</span> <span class="k">FROM</span> <span class="n">pg_stat_database</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">tup_updated</span> <span class="k">DESC</span><span class="p">;</span>
<span class="go"> datname | tup_updated | tup_inserted | tup_deleted</span>
<span class="go">---------------------------------------+-------------+--------------+-------------</span>
<span class="go"> site1 | 1138475 | 191605 | 136569</span>
<span class="go"> site2 | 153224 | 46650 | 12385</span>
</pre></div>
<p>Now we've got a list of databases which are excessively updating and inserting
new rows.</p>
</div>
<div class="section" id="how-did-we-get-here">
<h2>How did we get here?</h2>
<p>By being too lazy with <tt class="docutils literal">.create_or_update()</tt> in applications doing regular
syncs.</p>
<p>If you've got Django code syncing with a third party service, it's easy to write
code similar to:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">tweet</span> <span class="ow">in</span> <span class="n">tweet_list</span><span class="p">:</span>
<span class="n">Tweet</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create_or_update</span><span class="p">(</span><span class="n">tweet_id</span><span class="o">=</span><span class="n">tweet</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span> <span class="n">defaults</span><span class="o">=</span><span class="p">{</span>
<span class="s1">'user'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'user'</span><span class="p">][</span><span class="s1">'screen_name'</span><span class="p">],</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'text'</span><span class="p">],</span>
<span class="s1">'retweet_count'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'retweet_count'</span><span class="p">],</span>
<span class="p">})</span>
</pre></div>
<p>If you're only updating a few objects occasionally - this is fine and does the
job. However if you're updating hundreds of objects every hour - you could end
up with hundreds of thousands of rows being updated on a weekly basis, even
though most of the data is likely to stay the same.</p>
<p>Every field for the object gets updated every time the object gets saved. If
you've got a model with more fields, this will add up very quickly to additional
WAL data.</p>
</div>
<div class="section" id="avoiding-updates">
<h2>Avoiding updates</h2>
<p>If only some of the synced data is actually being updated - just update the
fields which will receive any updates:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">tweet</span> <span class="ow">in</span> <span class="n">tweet_list</span><span class="p">:</span>
<span class="n">obj</span><span class="p">,</span> <span class="n">created</span> <span class="o">=</span> <span class="n">Tweet</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">tweet_id</span><span class="o">=</span><span class="n">tweet</span><span class="p">[</span><span class="s1">'id'</span><span class="p">],</span> <span class="n">defaults</span><span class="o">=</span><span class="p">{</span>
<span class="s1">'user'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'user'</span><span class="p">][</span><span class="s1">'screen_name'</span><span class="p">],</span>
<span class="s1">'text'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'text'</span><span class="p">],</span>
<span class="s1">'retweet_count'</span><span class="p">:</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'retweet_count'</span><span class="p">],</span>
<span class="p">})</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">created</span><span class="p">:</span>
<span class="c1"># Update counts, but try to avoid excessive updates</span>
<span class="n">update_fields</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">if</span> <span class="n">obj</span><span class="o">.</span><span class="n">retweet_count</span> <span class="o">!=</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'retweet_count'</span><span class="p">]:</span>
<span class="n">obj</span><span class="o">.</span><span class="n">retweet_count</span> <span class="o">=</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'retweet_count'</span><span class="p">]</span>
<span class="n">update_fields</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'retweet_count'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">update_fields</span><span class="p">:</span>
<span class="n">obj</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">update_fields</span><span class="o">=</span><span class="n">update_fields</span><span class="p">)</span>
</pre></div>
<p>In this example we're assuming tweet IDs or usernames won't ever change. This is
probably a safe assumption for IDs, and slightly less so for usernames.</p>
<p>By switching to <tt class="docutils literal">.get_or_create()</tt>, and using <tt class="docutils literal">.save(update_fields=list)</tt> -
we've significantly reduced the number of updates. If the object already exists
and none of the fields have been changed - <tt class="docutils literal">update_fields</tt> will be an empty
list and we don't even bother with <tt class="docutils literal">.save()</tt>.</p>
<p>The only downside using this method is that it can be tedious having to check
every single field for updates. We could probably improve this with a more
generic solution - but at this point it's easier to use a third party package
which solves the problem.</p>
</div>
<div class="section" id="django-dirty-fields">
<h2>Django Dirty Fields</h2>
<p><a class="reference external" href="https://github.com/romgar/django-dirtyfields">Django Dirty Fields</a> is an easy alternative to writing
your own code to check all of the fields of an object. Just add a
<tt class="docutils literal">DirtyFieldsMixin</tt> to your model and you can simplify your code:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">tweet</span> <span class="ow">in</span> <span class="n">tweet_list</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">Tweet</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">tweet_id</span><span class="o">=</span><span class="n">tweet</span><span class="p">[</span><span class="s1">'id'</span><span class="p">])</span>
<span class="k">except</span> <span class="n">Tweet</span><span class="o">.</span><span class="n">DoesNotExist</span><span class="p">:</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">Tweet</span><span class="p">(</span><span class="n">tweet_id</span><span class="o">=</span><span class="n">tweet</span><span class="p">[</span><span class="s1">'id'</span><span class="p">])</span>
<span class="n">obj</span><span class="o">.</span><span class="n">user</span> <span class="o">=</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'user'</span><span class="p">][</span><span class="s1">'screen_name'</span><span class="p">]</span>
<span class="n">obj</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'text'</span><span class="p">]</span>
<span class="n">obj</span><span class="o">.</span><span class="n">retweet_count</span> <span class="o">=</span> <span class="n">tweet</span><span class="p">[</span><span class="s1">'retweet_count'</span><span class="p">]</span>
<span class="c1"># Only save if needed</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">obj</span><span class="o">.</span><span class="n">id</span><span class="p">:</span>
<span class="n">obj</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">elif</span> <span class="n">obj</span><span class="o">.</span><span class="n">is_dirty</span><span class="p">():</span>
<span class="n">obj</span><span class="o">.</span><span class="n">save_dirty_fields</span><span class="p">()</span>
</pre></div>
<p>Considerably easier to deal with!</p>
<p>The one huge advantage of this approach is that you can add as many fields as
you want, and Django Dirty Fields will do the job of figuring out which fields
to update for you. As before if none of the fields have been updated - the
object won't be saved.</p>
</div>
<div class="section" id="the-results">
<h2>The Results</h2>
<p>After trying to fix several projects:</p>
<div class="highlight"><pre><span></span><span class="gp">$</span> barman list-backup all
<span class="go">golestandt 20160704T063009 - Mon Jul 4 06:33:29 2016 - Size: 4.6 GiB - WAL Size: 822.0 MiB</span>
</pre></div>
<p>There's still a few old projects which haven't been optimised - however we've
now managed to reduce the size of the weekly WAL to be smaller than the weekly
snapshot.</p>
</div>