Blog of Lynn RootHomepage and blog of Lynn Root. Coder, writer, speaker, and PyLady of San Francisco.http://www.roguelynn.com/2018-08-02T15:38:29ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/True Concurrency with asynciohttp://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/2018-07-26T15:37:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><em>Foreword: This is <strong>part 1</strong> of a 5-part series titled</em> “<code>asyncio</code>: We Did It Wrong.” <em>Once done, follow along with <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">Part 2: Graceful Shutdowns</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/">Part 3: Exception Handling</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">Part 4: Working with Synchronous & Threaded Code</a>, and Part 5: Testing <code>asyncio</code> Code (coming soon!).</em></p>
<p><em>Example code can be found on <a href="https://github.com/econchick/mayhem">GitHub</a>. All code on this post is licensed under <a href="https://github.com/econchick/mayhem/blob/master/LICENSE">MIT</a>.</em></p>
<hr>
<h3 id="goal-mayhem-mandrill">Goal: Mayhem Mandrill</h3>
<p>To recap from <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong/">the intro</a>, we are building a mock chaos monkey-like service called “Mayhem Mandrill”. This is an event-driven service that consumes from a pub/sub, and initiates a mock restart of a host. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.</p>
<h3 id="initial-setup">Initial Setup</h3>
<p>There are a lot of choices for pub/sub-like technologies out there; I’m most familiar with Google Cloud Pub/Sub. But for our purposes, we’ll simulate a pub/sub with <code>asyncio</code>, inspired by <a href="http://asyncio.readthedocs.io/en/latest/producer_consumer.html">this official-looking tutorial</a> using <code>asyncio.Queue</code>s.</p>
<p><em>Side note:</em> Using f-strings with like I do within log messages may not be ideal: no matter what the log level is set at, f-strings will always be evaluated; whereas the old form (<code>'foo %s' % 'bar'</code>) is lazily-evaluated. But I just love f-strings.</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import asyncio
import logging
import random
import string
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostname = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
# simulating an external publisher of events
async def publish(queue, n):
choices = string.ascii_lowercase + string.digits
for x in range(1, n + 1):
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=x, instance_name=f'cattle-{host_id}')
await queue.put(msg)
logging.info(f'Published {x} of {n} messages')
await queue.put(None) # publisher is done
async def consume(queue):
while True:
# wait for an item from the publisher
msg = await queue.get()
if msg is None: # publisher is done
break
# process the msg
logging.info(f'Consumed {msg}')
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
if __name__ == '__main__':
queue = asyncio.Queue()
publisher_coro = publish(queue, 5)
consumer_coro = consume(queue)
asyncio.run(publisher_coro)
asyncio.run(consumer_coro)
</code></pre>
<p>When we run this, we see:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_1.py
18:38:02,124 INFO: Published 1 of 5 messages
18:38:02,124 INFO: Published 2 of 5 messages
18:38:02,124 INFO: Published 3 of 5 messages
18:38:02,124 INFO: Published 4 of 5 messages
18:38:02,124 INFO: Published 5 of 5 messages
18:38:02,124 INFO: Consumed PubSubMessage(instance_name='cattle-pdcg')
18:38:02,188 INFO: Consumed PubSubMessage(instance_name='cattle-nbs9')
18:38:02,952 INFO: Consumed PubSubMessage(instance_name='cattle-hw4f')
18:38:03,075 INFO: Consumed PubSubMessage(instance_name='cattle-bza9')
18:38:03,522 INFO: Consumed PubSubMessage(instance_name='cattle-gjl4')
</code></pre>
<p>We’ll use this as the starting point for a pub/sub simulator. </p>
<h3 id="running-an-asyncio-based-service">Running an <code>asyncio</code>-based Service</h3>
<p>So far, we don’t have a running service; it’s merely just a pipeline or a batch job right now. In order to continuously run, we need to use <code>loop.run_forever</code>. For this, we have to schedule and create <a href="https://docs.python.org/3/library/asyncio-task.html#task">tasks</a> out of the <a href="https://docs.python.org/3/library/asyncio-task.html#coroutines">coroutines</a>, then start the loop:</p>
<pre><code data-lang="py3">if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
loop.create_task(publish(queue, 5))
loop.create_task(consume(queue))
loop.run_forever()
logging.info('Cleaning up')
loop.close()
</code></pre>
<p>When running with this updated code, we see that all messages are published and then consumed. Then we hang because there is no more work to be done; we only published 5 messages, after all. To stop the “hanging” process, we must interrupt it (via <code>^C</code> or sending a signal like <code>kill -15 <pid></code>):</p>
<pre><code data-lang="console">$ python mandrill/mayhem_3.py
19:45:17,540 INFO: Published 1 of 5 messages
19:45:17,540 INFO: Published 2 of 5 messages
19:45:17,541 INFO: Published 3 of 5 messages
19:45:17,541 INFO: Published 4 of 5 messages
19:45:17,541 INFO: Published 5 of 5 messages
19:45:17,541 INFO: Consumed PubSubMessage(instance_name='cattle-ms1t')
19:45:17,749 INFO: Consumed PubSubMessage(instance_name='cattle-p6l9')
^CTraceback (most recent call last):
File "mandrill/mayhem_3.py", line 68, in <module>
loop.run_forever()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 523, in run_forever
self._run_once()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 1722, in _run_once
event_list = self._selector.select(timeout)
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/selectors.py", line 558, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt
</code></pre>
<p>That’s nice and …ugly. You may notice that we never get to the log line of <code>'Cleaning up'</code> nor do we close the loop. We’re also not handling any exceptions that may raise from awaiting <code>publish</code> and <code>consume</code>. Let’s fix that a bit.</p>
<h3 id="running-the-event-loop-defensively">Running the event loop defensively</h3>
<p>We’ll first address the catching of exceptions that arise from coroutines. To illustrate how we’re not handling exceptions, I’ll fake an error in the <code>consume</code> coroutine:</p>
<pre><code data-lang="py3">async def consume(queue):
while True:
msg = await queue.get()
# super-realistic simulation of an exception
if msg.msg_id == 4:
raise Exception('an exception happened!')
if msg is None:
break
# process the msg
logging.info(f'Consumed {msg}')
# simulate i/o operation using sleep
await asyncio.sleep(random.random())
</code></pre>
<p>If we run it as is:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_3.py
17:39:52,933 INFO: Published 1 of 5 messages
17:39:52,933 INFO: Published 2 of 5 messages
17:39:52,933 INFO: Published 3 of 5 messages
17:39:52,933 INFO: Published 4 of 5 messages
17:39:52,933 INFO: Published 5 of 5 messages
17:39:52,933 INFO: Consumed PubSubMessage(instance_name='cattle-cu7f')
17:39:53,876 INFO: Consumed PubSubMessage(instance_name='cattle-xihm')
17:39:54,599 INFO: Consumed PubSubMessage(instance_name='cattle-clnn')
17:39:55,51 ERROR: Task exception was never retrieved
future: <Task finished coro=<consume() done, defined at mandrill/mayhem_3.py:45> exception=Exception('an exception happened!')>
Traceback (most recent call last):
File "mandrill/mayhem_3.py", line 52, in consume
raise Exception('an exception happened!')
Exception: an exception happened!
^CTraceback (most recent call last):
File "mandrill/mayhem_3.py", line 72, in <module>
loop.run_forever()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 523, in run_forever
self._run_once()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 1722, in _run_once
event_list = self._selector.select(timeout)
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/selectors.py", line 558, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt
</code></pre>
<p>We get an error saying “exception was never retrieved.” This is admittedly a part of the <code>asyncio</code> API that’s not that friendly. If this was synchronous code, we’d simply see the error that we raised and error out. But this gets swallowed up into an unretrieved task exception. </p>
<p>So to deal with this, as <a href="https://docs.python.org/3/library/asyncio-dev.html#detect-exceptions-never-consumed">advised in the asyncio documentation</a>, we need a wrapper coroutine to consume the exception. </p>
<p>Since we’re wrapping top-level coroutines (<code>publish</code> and <code>consume</code>), we’ll probably want to stop the loop, at least for now. If we can’t publish or consume, then we should probably investigate.</p>
<pre><code data-lang="py3"># <--snip-->
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
wrapped_publisher = handle_exception(publish(queue, 5), loop)
wrapped_consumer = handle_exception(consume(queue), loop)
loop.create_task(wrapped_publisher)
loop.create_task(wrapped_consumer)
try:
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.close()
</code></pre>
<p>Now we get something a little cleaner:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_4.py
17:46:01,208 INFO: Published 1 of 5 messages
17:46:01,208 INFO: Published 2 of 5 messages
17:46:01,208 INFO: Published 3 of 5 messages
17:46:01,208 INFO: Published 4 of 5 messages
17:46:01,209 INFO: Published 5 of 5 messages
17:46:01,209 INFO: Consumed PubSubMessage(instance_name='cattle-hotv')
17:46:01,824 INFO: Consumed PubSubMessage(instance_name='cattle-un2v')
17:46:02,139 INFO: Consumed PubSubMessage(instance_name='cattle-0qe3')
17:46:02,671 ERROR: Caught exception
17:46:02,672 INFO: Cleaning up
</code></pre>
<p>This is clean enough for now, but later on we’ll build off of this by adding a <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">graceful shutdown in the next part of the series</a>.</p>
<h3 id="we39re-still-blocking">We’re still blocking</h3>
<p>I’ve seen quite a tutorials that make use of <code>async</code> and <code>await</code> in a way that, while does not block the event loop, is still iterating through tasks serially, effectively not actually adding any concurrency. </p>
<p>Taking a look at where our script is now:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import asyncio
import logging
import random
import string
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostname = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
# simulating an external publisher of events
async def publish(queue, n):
choices = string.ascii_lowercase + string.digits
for x in range(1, n + 1):
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=x, instance_name=instance_name)
await queue.put(msg)
logging.info(f'Published {x} of {n} messages')
await queue.put(None) # publisher is done
async def consume(queue):
while True:
# wait for an item from the publisher
msg = await queue.get()
if msg.msg_id == 4: # super-realistic simulation of an exception
raise Exception('an exception happened!')
if msg is None: # publisher is done
break
# process the msg
logging.info(f'Consumed {msg}')
# simulate i/o operation using sleep
await asyncio.sleep(random.random())
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
wrapped_publisher = handle_exception(publish(queue, 5), loop)
wrapped_consumer = handle_exception(consume(queue), loop)
loop.create_task(wrapped_publisher)
loop.create_task(wrapped_consumer)
try:
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.close()
</code></pre>
<p>As this was adapted from <a href="http://asyncio.readthedocs.io/en/latest/producer_consumer.html">this popular tutorial</a>, we are still sequentially processing each item we produce and consume. The event loop itself isn’t blocked; if we had other tasks/coroutines going on, they of course wouldn’t be blocked. </p>
<p>This might seem obvious to some, but it definitely isn’t to all. We <strong>are</strong> essentially blocking ourselves; first we produce all the messages, one by one. Then we consume them, one by one. The loops we have (<code>for x in range(1, n+1)</code> in <code>publish()</code>, and <code>while True</code> in <code>consume()</code>) block ourselves from moving onto the next message while we await to do something. </p>
<p>While this is technically a working example of a pub/sub-like queue with <code>asyncio</code>, it’s not what we want. Whether we are building an event-driven service (like this walk through), or a pipeline/batch job, we’re not taking advantage of the concurrency that <code>asyncio</code> can provide. </p>
<h5 id="aside-compare-to-synchronous-code">Aside: Compare to synchronous code</h5>
<p>As I confessed earlier, I find <code>asyncio</code>’s API to be quite user-friendly (although some <a href="https://veriny.tf/asyncio-a-dumpster-fire-of-bad-design/">disagree</a> with valid reasons). It’s very easy to get up and running with the event loop. When first picking up concurrency, this <code>async</code> and <code>await</code> syntax is a low hurdle to start using since it makes it very similar to writing synchronous code. </p>
<p>But again, when first picking up concurrency, this API is deceptive and misleading. Yes, we are using the event loop and <code>asyncio</code> primitives. Yes it does work. Yes it seems faster – but that’s probably because you just came from 2.7 (welcome to 2014, by the way).</p>
<p>To illustrate how it’s no different than synchronous code, here’s the same script with all <code>asyncio</code>-related primitives removed:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import logging
import queue
import random
import string
import time
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostname = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
# simulating an external publisher of events
def publish(queue, n):
choices = string.ascii_lowercase + string.digits
for x in range(1, n + 1):
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=x, instance_name=instance_name)
queue.put(msg)
logging.info(f'Published {x} of {n} messages')
queue.put(None) # publisher is done
def consume(queue):
while True:
# wait for an item from the publisher
msg = queue.get()
if msg.msg_id == 4: # super-realistic simulation of an exception
raise Exception('an exception happened!')
if msg is None: # publisher is done
break
# process the msg
logging.info(f'Consumed {msg}')
# simulate i/o operation using sleep
time.sleep(random.random())
if __name__ == '__main__':
queue = queue.Queue()
publish(queue, 5)
consume(queue)
</code></pre>
<p>And running it shows there’s not a difference (only in the “randomness” of <code>random.random</code>) compared to the <code>asyncio</code>-enabled approach:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_5.py
17:56:46,947 INFO: Published 1 of 5 messages
17:56:46,947 INFO: Published 2 of 5 messages
17:56:46,947 INFO: Published 3 of 5 messages
17:56:46,947 INFO: Published 4 of 5 messages
17:56:46,947 INFO: Published 5 of 5 messages
17:56:46,947 INFO: Consumed PubSubMessage(instance_name='cattle-q10b')
17:56:47,318 INFO: Consumed PubSubMessage(instance_name='cattle-n7eg')
17:56:48,204 INFO: Consumed PubSubMessage(instance_name='cattle-mrij')
17:56:48,899 INFO: Consumed PubSubMessage(instance_name='cattle-se82')
17:56:49,726 INFO: Consumed PubSubMessage(instance_name='cattle-rkst')
</code></pre>
<p>Part of the problem could be that tutorial writers are presuming knowledge and the ability to extrapolate over-simplified examples. But it’s mainly because concurrency is just a difficult paradigm to grasp in general. We write code as we read anything: left-to-right, top-to-bottom. Most of us are just not use to having multitasking and context switching within our own programs that modern computers allow. Hell, even if we are familiar with concurrent programming, <a href="https://glyph.twistedmatrix.com/2014/02/unyielding.html">as Glyph would know</a>, understanding a concurrent system is hard.</p>
<p>But we’re not in over our heads yet. We can still make this simulated chaos monkey service <em>actually</em> concurrent in a rather simple way.</p>
<h3 id="actually-being-concurrent"><em>Actually</em> being concurrent</h3>
<p>To reiterate our goal here: we want to build an event-driven service that consumes from a pub/sub, and processes messages as they come in. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.</p>
<p>To help facilitate this, we’ll also need to build a service that actually runs forever. We’re not going to have a preset number of messages; we need to react whenever we’re told to restart an instance. The triggering event to publish a restart request message could be an on-demand request from a service owner, or a scheduled gradually rolling restart of the fleet.</p>
<h4 id="concurrent-publisher">Concurrent publisher</h4>
<p>Let’s first create a mock publisher that will always be publishing restart request messages, and therefore never indicate that it’s done. This also means we’re not providing a set number of messages to publish, so we have to rework that a bit, too. Here I’m just adding the creation of a unique ID for each message produced:</p>
<pre><code data-lang="py3"># <-- snip -->
import uuid
# <-- snip -->
async def publish(queue):
choices = string.ascii_lowercase + string.digits
while True:
msg_id = str(uuid.uuid4())
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
await queue.put(msg)
logging.info(f'Published message {msg}')
# simulate randomness of publishing messages
await asyncio.sleep(random.random())
# <-- snip -->
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
publisher_coro = handle_exception(publish(queue), loop)
try:
loop.create_task(publisher_coro)
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>Running for a few messages, then killing it, we see:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_6.py
18:08:02,995 INFO: Published message PubSubMessage(instance_name='cattle-w8kz')
18:08:03,988 INFO: Published message PubSubMessage(instance_name='cattle-fr4o')
18:08:04,587 INFO: Published message PubSubMessage(instance_name='cattle-vlyg')
18:08:05,270 INFO: Published message PubSubMessage(instance_name='cattle-v6zu')
18:08:05,558 INFO: Published message PubSubMessage(instance_name='cattle-mws2')
^C18:08:05,903 INFO: Cleaning up
Traceback (most recent call last):
File "mandrill/mayhem_6.py", line 60, in <module>
loop.run_forever()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 523, in run_forever
self._run_once()
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 1722, in _run_once
event_list = self._selector.select(timeout)
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/selectors.py", line 558, in select
kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt
</code></pre>
<p>We’re happily creating and publishing messages, but you’ll notice that <code>KeyboardInterrupt</code> – trigged by the <code>^C</code> – is not actually caught. Let’s quickly clean up that traceback from the <code>KeyboardInterrupt</code>; it’s a quick band-aid, as further explained later on.</p>
<pre><code data-lang="py3"># <--snip-->
if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
publisher_coro = handle_exception(publish(queue), loop)
try:
loop.create_task(publisher_coro)
loop.run_forever()
except KeyboardInterrupt:
logging.info('Interrupted')
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>Now we see:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_6.py
18:09:48,337 INFO: Published message PubSubMessage(instance_name='cattle-s8x2')
18:09:48,643 INFO: Published message PubSubMessage(instance_name='cattle-4aat')
^C18:09:49,83 INFO: Interrupted
18:09:49,83 INFO: Cleaning up
</code></pre>
<p>Fantastic! Much cleaner. <strong>Note:</strong> Catching <code>KeyboardInterrupt</code> isn’t enough; follow <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">Part 2</a> of this series for a better approach.</p>
<p>It’s probably hard to see how this is concurrent right now. Let’s add multiple producers to help see this fact. I’ll temporarily updating the <code>publish</code> coroutine function to take in a <code>publisher_id</code> to make it clear that we have multiple publishers:</p>
<pre><code data-lang="py3">async def publish(queue, publisher_id):
choices = string.ascii_lowercase + string.digits
while True:
msg_id = str(uuid.uuid4())
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
await queue.put(msg)
logging.info(f'[{publisher_id}] Published message {msg}')
# simulate randomness of publishing messages
await asyncio.sleep(random.random())
</code></pre>
<p>and then create multiple coroutines:</p>
<pre><code data-lang="py3">if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
# not that readable - sorry!
coros = [handle_exception(publish(queue, i), loop) for i in range(1, 4)]
try:
[loop.create_task(coro) for coro in coros]
loop.run_forever()
except KeyboardInterrupt:
logging.info('Interrupted')
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>So now it’s a bit easier to see the concurrency with the out-of-order publisher IDs:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_7.py
18:15:38,838 INFO: [1] Published message PubSubMessage(instance_name='cattle-tnh8')
18:15:38,838 INFO: [2] Published message PubSubMessage(instance_name='cattle-wyt2')
18:15:38,838 INFO: [3] Published message PubSubMessage(instance_name='cattle-kh0l')
18:15:39,119 INFO: [1] Published message PubSubMessage(instance_name='cattle-5u61')
18:15:39,615 INFO: [3] Published message PubSubMessage(instance_name='cattle-mbvw')
18:15:39,689 INFO: [1] Published message PubSubMessage(instance_name='cattle-80ro')
18:15:39,774 INFO: [2] Published message PubSubMessage(instance_name='cattle-xlm4')
18:15:39,865 INFO: [1] Published message PubSubMessage(instance_name='cattle-hlwx')
18:15:39,872 INFO: [2] Published message PubSubMessage(instance_name='cattle-7l1v')
18:15:40,273 INFO: [3] Published message PubSubMessage(instance_name='cattle-gf6k')
18:15:40,294 INFO: [1] Published message PubSubMessage(instance_name='cattle-iq3r')
^C18:15:40,637 INFO: Interrupted
18:15:40,637 INFO: Cleaning up
</code></pre>
<p>Huzzah! </p>
<p>For the rest of the walk through, I’ll remove the multiple publishers; I just wanted to easily convey that it’s now concurrent, not just non-blocking.</p>
<p>I will also switch the log level of the publisher logs to <code>debug</code> so we can focus on the meat of the service since the <code>publish</code> coroutine function is merely meant to simulate an external pub/sub-like system.</p>
<h4 id="concurrent-consumer">Concurrent consumer</h4>
<p>Now comes the time to add concurrency to the consumer bit. For this, the goal is to constantly consume messages from the queue and create non-blocking work based off of a newly-consumed message; in this case, to restart an instance.</p>
<p>The tricky part is the consumer needs to be written in a way that the consumption of a new message from the queue is separate from the work on the message itself. In other words, we have to simulate being “event-driven” by regularly pulling for a message in the queue since there’s no way to trigger work based off of a new message available in the queue (a.k.a. push-based). </p>
<p>Let’s first mock the restart work that needs to be done on any consumed message:</p>
<pre><code data-lang="py3">async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Restarted {msg.hostname}')
</code></pre>
<p>We’ll stick with our <code>while True</code> loop and await for the next message on the queue, and then create a task (and - not obviously - schedule it on the loop) out of <code>restart_host</code> rather than just <code>await</code> it.</p>
<pre><code data-lang="py3">async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(restart_host(msg))
</code></pre>
<p>Then adding it to our main section:</p>
<pre><code data-lang="py3">if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
publisher_coro = handle_exception(publish(queue), loop)
consumer_coro = handle_exception(consumer(queue), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
except KeyboardInterrupt:
logging.info('Interrupted')
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>Running this, we see:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_8.py
16:32:20,639 INFO: Pulled PubSubMessage(instance_name='cattle-dhln')
16:32:20,639 INFO: Pulled PubSubMessage(instance_name='cattle-xp42')
16:32:20,639 INFO: Pulled PubSubMessage(instance_name='cattle-3v98')
16:32:20,673 INFO: Restarted cattle-3v98.example.net
16:32:20,786 INFO: Pulled PubSubMessage(instance_name='cattle-du7r')
16:32:20,882 INFO: Pulled PubSubMessage(instance_name='cattle-bcur')
16:32:21,108 INFO: Restarted cattle-xp42.example.net
16:32:21,112 INFO: Restarted cattle-dhln.example.net
16:32:21,205 INFO: Restarted cattle-bcur.example.net
16:32:21,415 INFO: Pulled PubSubMessage(instance_name='cattle-bd2z')
16:32:21,434 INFO: Pulled PubSubMessage(instance_name='cattle-680o')
16:32:21,477 INFO: Restarted cattle-bd2z.example.net
16:32:21,550 INFO: Pulled PubSubMessage(instance_name='cattle-94cd')
16:32:21,679 INFO: Restarted cattle-680o.example.net
16:32:21,766 INFO: Restarted cattle-du7r.example.net
16:32:21,887 INFO: Pulled PubSubMessage(instance_name='cattle-z70b')
16:32:21,998 INFO: Restarted cattle-z70b.example.net
16:32:22,25 INFO: Pulled PubSubMessage(instance_name='cattle-ploc')
^C16:32:22,86 INFO: Interrupted
16:32:22,86 INFO: Cleaning up
</code></pre>
<p>Nice. We’re now pulling for messages whenever they’re available.</p>
<h4 id="concurrent-work">Concurrent work</h4>
<p>We may want to do more than one thing per message. For example, we’d like to store the message in a database for potentially replaying later as well as initiate a restart of the given host:</p>
<pre><code data-lang="py3">async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Restarted {msg.hostname}')
async def save(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Saved {msg} into database')
</code></pre>
<p>Within the <code>consume</code> coroutine function, we <em>could</em> just <code>await</code> on both coroutines serially:</p>
<pre><code data-lang="py3">async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
# sequential awaits may not what you want
await save(msg)
await restart_host(msg)
</code></pre>
<p>And running the script with this looks like:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_9.py
16:34:11,754 INFO: Pulled PubSubMessage(instance_name='cattle-ppki')
16:34:12,304 INFO: Saved PubSubMessage(instance_name='cattle-ppki') into database
16:34:12,340 INFO: Restarted cattle-ppki.example.net
16:34:12,340 INFO: Pulled PubSubMessage(instance_name='cattle-dl3k')
16:34:12,647 INFO: Saved PubSubMessage(instance_name='cattle-dl3k') into database
16:34:13,583 INFO: Restarted cattle-dl3k.example.net
16:34:13,583 INFO: Pulled PubSubMessage(instance_name='cattle-8is1')
16:34:14,318 INFO: Saved PubSubMessage(instance_name='cattle-8is1') into database
16:34:14,757 INFO: Restarted cattle-8is1.example.net
16:34:14,757 INFO: Pulled PubSubMessage(instance_name='cattle-51fk')
16:34:15,205 INFO: Saved PubSubMessage(instance_name='cattle-51fk') into database
16:34:15,302 INFO: Restarted cattle-51fk.example.net
16:34:15,303 INFO: Pulled PubSubMessage(instance_name='cattle-nv87')
16:34:15,844 INFO: Saved PubSubMessage(instance_name='cattle-nv87') into database
16:34:15,913 INFO: Restarted cattle-nv87.example.net
16:34:15,913 INFO: Pulled PubSubMessage(instance_name='cattle-f88i')
^C16:34:16,66 INFO: Interrupted
16:34:16,66 INFO: Cleaning up
</code></pre>
<p>We can see that although it doesn’t block the event loop, <code>await save(msg)</code> blocks <code>await restart_host(msg)</code>, which blocks the consumption of future messages. But, perhaps we don’t <em>need</em> to await these two coroutines one right after another. These two tasks don’t necessarily need to depend on one another – completely side-stepping the potential concern/complexity of “should we restart a host if we fail to add the message to the database”. </p>
<p>So let’s treat them as such. Instead of awaiting them, we can make use <code>asyncio.create_task</code> again to have them scheduled on the loop, basically chucking it over to the loop for it to execute when it next can.</p>
<pre><code data-lang="py3">async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(save(msg))
asyncio.create_task(restart_host(msg))
</code></pre>
<p>Running with this approach, we can see <code>save</code> doesn’t unnecessarily block <code>restart_host</code>:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_10.py
18:49:22,114 INFO: Pulled PubSubMessage(instance_name='cattle-7tsz')
18:49:22,219 INFO: Pulled PubSubMessage(instance_name='cattle-1kgp')
18:49:22,272 INFO: Saved PubSubMessage(instance_name='cattle-7tsz') into database
18:49:22,512 INFO: Restarted cattle-1kgp.example.net
18:49:22,640 INFO: Restarted cattle-7tsz.example.net
18:49:22,716 INFO: Saved PubSubMessage(instance_name='cattle-1kgp') into database
18:49:22,998 INFO: Pulled PubSubMessage(instance_name='cattle-1wdy')
18:49:23,043 INFO: Saved PubSubMessage(instance_name='cattle-1wdy') into database
18:49:23,279 INFO: Pulled PubSubMessage(instance_name='cattle-e9rl')
18:49:23,370 INFO: Restarted cattle-1wdy.example.net
18:49:23,479 INFO: Pulled PubSubMessage(instance_name='cattle-crnh')
18:49:23,612 INFO: Saved PubSubMessage(instance_name='cattle-crnh') into database
18:49:24,155 INFO: Restarted cattle-e9rl.example.net
18:49:24,173 INFO: Saved PubSubMessage(instance_name='cattle-e9rl') into database
18:49:24,259 INFO: Pulled PubSubMessage(instance_name='cattle-hbbd')
18:49:24,279 INFO: Restarted cattle-crnh.example.net
18:49:24,292 INFO: Pulled PubSubMessage(instance_name='cattle-8mg0')
18:49:24,324 INFO: Saved PubSubMessage(instance_name='cattle-hbbd') into database
18:49:24,550 INFO: Saved PubSubMessage(instance_name='cattle-8mg0') into database
18:49:24,716 INFO: Pulled PubSubMessage(instance_name='cattle-hyv1')
18:49:24,817 INFO: Saved PubSubMessage(instance_name='cattle-hyv1') into database
^C18:49:25,017 INFO: Interrupted
18:49:25,018 INFO: Cleaning up
</code></pre>
<p>Yay!</p>
<h5 id="aside-when-you-want-sequential-work">Aside: When you want sequential work</h5>
<p>As an aside, sometimes you want your work to happen serially. </p>
<p>For instance, maybe you only want to restart hosts that have an uptime of more than 7 days, so you await another coroutine to check a host’s last restart date:</p>
<pre><code data-lang="py3">async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
# potentially what you want
last_restart = await last_restart_date(msg)
if today - last_restart > max_days:
await restart_host(msg)
</code></pre>
<p>Needing code to be sequential, to have steps or dependencies, it doesn’t mean that it can’t still be asynchronous. The <code>await last_restart_date(msg)</code> will yield to the loop, but it doesn’t mean that <code>restart_host</code> of that <code>msg</code> will be the next thing that the loop executes. It just allows other work to happen that has been scheduled on the loop.</p>
<h4 id="message-cleanup">Message Cleanup</h4>
<p>We’ve pulled a message from the queue, and fanned out work based off of that message. Now we need to perform any finalizing work on that message; for example, we need to acknowledge that we’re done with the message so it isn’t re-delivered by mistake. </p>
<p>We’ll separate out the pulling of the message from the creating work off of it. Then we can make use of <code>asyncio.gather</code> to add a callback:</p>
<pre><code data-lang="py3"># <--snip-->
import functools
# <--snip-->
def cleanup(msg, fut):
logging.info(f'Done. Acked {msg}')
async def handle_message(msg):
g_future = asyncio.gather(save(msg), restart_host(msg))
callback = functools.partial(cleanup, msg)
g_future.add_done_callback(callback)
await g_future
async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(handle_message(msg))
# <--snip-->
</code></pre>
<p>So once both <code>save(msg)</code> and <code>restart(msg)</code> coroutines are complete, <code>cleanup</code> will be called:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_11.py
19:00:27,747 INFO: Pulled PubSubMessage(instance_name='cattle-xuf1')
19:00:27,848 INFO: Pulled PubSubMessage(instance_name='cattle-kk87')
19:00:27,861 INFO: Restarted cattle-xuf1.example.net
19:00:28,61 INFO: Saved PubSubMessage(instance_name='cattle-kk87') into database
19:00:28,244 INFO: Restarted cattle-kk87.example.net
19:00:28,245 INFO: Done. Acked PubSubMessage(instance_name='cattle-kk87')
19:00:28,572 INFO: Pulled PubSubMessage(instance_name='cattle-pdej')
19:00:28,659 INFO: Saved PubSubMessage(instance_name='cattle-xuf1') into database
19:00:28,659 INFO: Done. Acked PubSubMessage(instance_name='cattle-xuf1')
19:00:28,831 INFO: Saved PubSubMessage(instance_name='cattle-pdej') into database
19:00:29,333 INFO: Pulled PubSubMessage(instance_name='cattle-x9kz')
19:00:29,339 INFO: Pulled PubSubMessage(instance_name='cattle-sicp')
19:00:29,455 INFO: Restarted cattle-pdej.example.net
19:00:29,455 INFO: Done. Acked PubSubMessage(instance_name='cattle-pdej')
19:00:29,506 INFO: Saved PubSubMessage(instance_name='cattle-sicp') into database
19:00:29,617 INFO: Restarted cattle-sicp.example.net
19:00:29,617 INFO: Done. Acked PubSubMessage(instance_name='cattle-sicp')
19:00:29,795 INFO: Restarted cattle-x9kz.example.net
19:00:29,914 INFO: Saved PubSubMessage(instance_name='cattle-x9kz') into database
19:00:29,914 INFO: Done. Acked PubSubMessage(instance_name='cattle-x9kz')
19:00:30,195 INFO: Pulled PubSubMessage(instance_name='cattle-o501')
^C19:00:30,305 INFO: Interrupted
19:00:30,305 INFO: Cleaning up
</code></pre>
<p>I personally have an allergy to callbacks. As well, perhaps we need <code>cleanup</code> to be non-blocking. Another approach could be just to <code>await</code> it:</p>
<pre><code data-lang="py3">async def cleanup(msg):
logging.info(f'Done. Acked {msg}')
# unhelpful simulation of i/o work
await asyncio.sleep(0)
async def handle_message(msg):
await asyncio.gather(save(msg), restart_host(msg))
await cleanup(msg)
</code></pre>
<p>Great!</p>
<h4 id="task-monitoring-other-tasks">Task monitoring other tasks</h4>
<p>Now, much like Google’s Pub/Sub, let’s say that the publisher will redeliver a message after 10 seconds if it has not been acknowledged. We are able to extend that “timeout” period or acknowledgment deadline for a message. </p>
<p>In order to do that, we now need to have a coroutine that, in essence, monitors all the other worker tasks. While they’re still continuing to work, this coroutine will extend the message acknowledgment deadline; then once <code>save</code> and <code>restart_host</code> are done, it should stop extending and cleanup the message.</p>
<p>One approach is to make use of <a href="https://docs.python.org/3/library/asyncio-sync.html#event"><code>asyncio.Event</code></a> primitives. Let’s also increase the <code>asyncio.sleep</code> time inside of <code>restart_host</code> to help illustrate extending the deadline:</p>
<pre><code data-lang="py3">async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.randrange(1,3))
logging.info(f'Restarted {msg.hostname}')
# <--snip-->
async def extend(msg, event):
while not event.is_set():
logging.info(f'Extended deadline by 3 seconds for {msg}')
# want to sleep for less than the deadline amount
await asyncio.sleep(2)
else:
await cleanup(msg)
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
await asyncio.gather(save(msg), restart_host(msg))
event.set()
</code></pre>
<p>Running this, we can see we’re extending while work continues, and cleaning up once done:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_12.py
19:04:29,602 INFO: Pulled PubSubMessage(instance_name='cattle-g7hy')
19:04:29,603 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-g7hy')
19:04:29,692 INFO: Saved PubSubMessage(instance_name='cattle-g7hy') into database
19:04:30,439 INFO: Pulled PubSubMessage(instance_name='cattle-wv21')
19:04:30,440 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-wv21')
19:04:30,605 INFO: Restarted cattle-g7hy.example.net
19:04:31,100 INFO: Saved PubSubMessage(instance_name='cattle-wv21') into database
19:04:31,203 INFO: Pulled PubSubMessage(instance_name='cattle-40w2')
19:04:31,203 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-40w2')
19:04:31,350 INFO: Pulled PubSubMessage(instance_name='cattle-ouqk')
19:04:31,350 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-ouqk')
19:04:31,445 INFO: Saved PubSubMessage(instance_name='cattle-40w2') into database
19:04:31,775 INFO: Done. Acked PubSubMessage(instance_name='cattle-g7hy')
19:04:31,919 INFO: Saved PubSubMessage(instance_name='cattle-ouqk') into database
19:04:32,184 INFO: Pulled PubSubMessage(instance_name='cattle-oqxz')
19:04:32,184 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-oqxz')
19:04:32,207 INFO: Restarted cattle-40w2.example.net
19:04:32,356 INFO: Restarted cattle-ouqk.example.net
19:04:32,441 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-wv21')
19:04:32,441 INFO: Restarted cattle-wv21.example.net
19:04:32,559 INFO: Saved PubSubMessage(instance_name='cattle-oqxz') into database
^C19:04:32,697 INFO: Interrupted
19:04:32,698 INFO: Cleaning up
</code></pre>
<p>If you <em>love</em> events, you could even make use of <code>event.wait</code>:</p>
<pre><code data-lang="py3">async def cleanup(msg, event):
# this will block the rest of the coro until `event.set` is called
await event.wait()
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Done. Acked {msg}')
async def extend(msg, event):
while not event.is_set():
logging.info(f'Extended deadline by 3 seconds for {msg}')
# want to sleep for less than the deadline amount
await asyncio.sleep(2)
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
await asyncio.gather(save(msg), restart_host(msg))
event.set()
</code></pre>
<p>Well, alright then! We got some concurrency!</p>
<h3 id="recap">Recap</h3>
<p><code>asyncio</code> is pretty easy to use, but being easy to use doesn’t automatically mean you’re using it correctly. You can’t just throw around <code>async</code> and <code>await</code> keywords around blocking code. It’s a shift in a mental paradigm. Both with needing to think of what work can be queued up and fired off, and also where your code might still need to be sequential.</p>
<p>Having sequential code – “first A, then B, then C” – may seem like it’s blocking when it’s not. Sequential code can still be asynchronous. I might have to call customer service for something, and wait to be taken off hold to talk to them, but while I wait, I can put the phone on speaker and pet <a href="https://www.instagram.com/p/BSpX3JihnXz/">my super needy cat</a>. I might be single-threaded as a person, but I can multi-task like CPUs, just like my code.</p>
<hr>
<p>Follow the next part of this series for <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">adding a graceful shutdown</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/">exception handling</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">working with synchronous and threaded code</a>, and testing <code>asyncio</code> code (coming soon!).</p>
Exception Handling in asynciohttp://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/2018-07-26T13:43:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><em>Foreword: This is <strong>part 3</strong> of a 5-part series titled</em> “<code>asyncio</code>: We Did It Wrong.” <em>Take a look at <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">Part 1: True Concurrency</a> and <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">Part 2: Graceful Shutdowns</a> for where we are in the tutorial now. Once done, follow along with <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">Part 4: Working with Synchronous & Threaded Code</a>, and Part 5: Testing <code>asyncio</code> Code (coming soon!).</em></p>
<p><em>Example code can be found on <a href="https://github.com/econchick/mayhem">GitHub</a>. All code on this post is licensed under <a href="https://github.com/econchick/mayhem/blob/master/LICENSE">MIT</a>.</em></p>
<hr>
<h3 id="mayhem-mandrill-recap">Mayhem Mandrill Recap</h3>
<p>The goal for this 5-part series is to build a mock chaos monkey-like service called “Mayhem Mandrill”. This is an event-driven service that consumes from a pub/sub, and initiates a mock restart of a host. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.</p>
<p>At the end of <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">part 2</a>, our service looked like this:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import asyncio
import functools
import logging
import random
import signal
import string
import uuid
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostnam e = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
async def publish(queue):
choices = string.ascii_lowercase + string.digits
while True:
msg_id = str(uuid.uuid4())
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
await queue.put(msg)
logging.debug(f'Published message {msg}')
# simulate randomness of publishing messages
await asyncio.sleep(random.random())
async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.randrange(1,3))
logging.info(f'Restarted {msg.hostname}')
async def save(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Saved {msg} into database')
async def cleanup(msg):
# this will block the rest of the coro until `event.set` is called
await event.wait()
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Done. Acked {msg}')
async def extend(msg, event):
while not event.is_set():
logging.info(f'Extended deadline by 3 seconds for {msg}')
# want to sleep for less than the deadline amount
await asyncio.sleep(2)
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
await asyncio.gather(save(msg), restart_host(msg))
event.set()
async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(handle_message(msg))
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
logging.info('Closing database connections')
logging.info('Nacking outstanding messages')
tasks = [t for t in asyncio.all_tasks() if t is not
asyncio.current_task()]
[task.cancel() for task in tasks]
logging.info('Canceling outstanding tasks')
await asyncio.gather(*tasks)
loop.stop()
logging.info('Shutdown complete.')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# May want to catch other signals too
signals = (signal.SIGHUP, signal.SIGTERM, signal.SIGINT)
for s in signals:
loop.add_signal_handler(
s, lambda s=s: asyncio.create_task(shutdown(s, loop)))
queue = asyncio.Queue()
publisher_coro = handle_exception(publish(queue), loop)
consumer_coro = handle_exception(consume(queue), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre><h3 id="exception-handling">Exception Handling</h3>
<p>You may have noticed that, while we’re catching exceptions on the top level, we’re not paying any mind to exceptions that could be raised from within coroutines like <code>restart_host</code>, <code>save</code>, etc. To show you what I mean, let’s fake an error where we can’t restart a host:</p>
<pre><code data-lang="py3">async def restart_host(msg):
# faked error
rand_int = random.randrange(1, 4)
if rand_int == 3:
raise Exception(f'Could not restart {msg.hostname}')
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Restarted {msg.hostname}')
</code></pre>
<p>Running it, we see (limiting it to one message to shorten logs):</p>
<pre><code data-lang="console">$ python mandrill/mayhem_15.py
08:55:58,122 INFO: Pulled PubSubMessage(instance_name='cattle-tx09')
08:55:58,122 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-tx09')
08:55:58,123 ERROR: Could not restart cattle-tx09.example.net
08:55:58,123 ERROR: Task exception was never retrieved
future: <Task finished coro=<handle_message() done, defined at mandrill/mayhem_15.py:72> exception=Exception('Could not restart cattle-tx09.example.net')>
Traceback (most recent call last):
File "mandrill/mayhem_15.py", line 82, in handle_message
await asyncio.gather(save_coro, restart_coro)
File "mandrill/mayhem_15.py", line 49, in restart_host
raise Exception(f'Could not restart {msg.hostname}')
Exception: Could not restart cattle-tx09.example.net
08:55:58,904 INFO: Saved PubSubMessage(instance_name='cattle-tx09') into database
08:56:00,127 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-tx09')
</code></pre>
<p>We see that <code>cattle-tx09.example.net</code> could not be restarted. While the service doesn’t crash and the message <strong>was saved</strong> to the database, it will never get cleaned up and <code>ack</code>ed. The <code>extend</code> on the message deadline will also keep spinning. This is because the exception raised was never returned, so we never hit the <code>event.set()</code> line. We’ve essentially deadlocked ourselves on the message. </p>
<p>The simple thing to do is add <code>return_exceptions=True</code> to <code>asyncio.gather</code>, so rather than completely dropping an exception, it’s returned along with the successful results:</p>
<pre><code data-lang="py3">async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
await asyncio.gather(save(msg), restart_host(msg), return_exceptions=True)
event.set()
</code></pre>
<p>We don’t see any tracebacks anymore in the output and messages are now being cleaned up and ack'ed; however, it’s still not that helpful since we don’t have <em>any</em> insight into if <code>restart_host</code> raised or not:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_15.py
09:08:50,658 INFO: Pulled PubSubMessage(instance_name='cattle-4f52')
09:08:50,659 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-4f52')
09:08:51,25 INFO: Pulled PubSubMessage(instance_name='cattle-orj0')
09:08:51,25 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-orj0')
09:08:51,497 INFO: Pulled PubSubMessage(instance_name='cattle-f4nw')
09:08:51,497 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-f4nw')
09:08:51,626 INFO: Saved PubSubMessage(instance_name='cattle-4f52') into database
09:08:51,706 INFO: Saved PubSubMessage(instance_name='cattle-orj0') into database
09:08:51,723 INFO: Done. Acked PubSubMessage(instance_name='cattle-4f52')
09:08:52,9 INFO: Saved PubSubMessage(instance_name='cattle-f4nw') into database
09:08:52,409 INFO: Pulled PubSubMessage(instance_name='cattle-dft2')
09:08:52,410 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-dft2')
09:08:52,444 INFO: Saved PubSubMessage(instance_name='cattle-dft2') into database
09:08:52,929 INFO: Done. Acked PubSubMessage(instance_name='cattle-dft2')
09:08:52,930 INFO: Pulled PubSubMessage(instance_name='cattle-ft4h')
09:08:52,930 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-ft4h')
09:08:53,29 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-orj0')
09:08:53,30 INFO: Restarted cattle-orj0.example.net
</code></pre>
<p>We <em>could</em> add a callback via <code>add_done_callback</code> to the <code>asyncio.gather</code> future, but as I said in <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">part 1</a> of this series, I’m allergic to callbacks. We can just process the results afterwards:</p>
<pre><code data-lang="py3">def handle_results(results):
for result in results:
if isinstance(result, Exception):
logging.error(f'Caught exception: {result}')
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
results = await asyncio.gather(
save(msg), restart_host(msg), return_exceptions=True
)
handle_results(results)
event.set()
</code></pre>
<p><code>handle_results</code> would be a good place for any retry logic, or logic dependent on whether a result was successful or not. </p>
<p>Running this, now we see:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_15.py
09:27:48,143 INFO: Pulled PubSubMessage(instance_name='cattle-gas8')
09:27:48,144 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-gas8')
09:27:48,644 INFO: Pulled PubSubMessage(instance_name='cattle-arpg')
09:27:48,645 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-arpg')
09:27:48,880 INFO: Saved PubSubMessage(instance_name='cattle-gas8') into database
09:27:48,880 ERROR: Caught exception: Could not restart cattle-gas8.example.net
09:27:49,385 INFO: Pulled PubSubMessage(instance_name='cattle-4nl3')
09:27:49,385 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-4nl3')
09:27:49,503 INFO: Saved PubSubMessage(instance_name='cattle-arpg') into database
09:27:49,504 ERROR: Caught exception: Could not restart cattle-arpg.example.net
09:27:49,656 INFO: Pulled PubSubMessage(instance_name='cattle-4713')
09:27:49,656 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-4713')
09:27:49,734 INFO: Saved PubSubMessage(instance_name='cattle-4nl3') into database
09:27:49,734 ERROR: Caught exception: Could not restart cattle-4nl3.example.net
09:27:49,747 INFO: Done. Acked PubSubMessage(instance_name='cattle-gas8')
</code></pre><h3 id="recap">Recap</h3>
<p>Exceptions will not crash the system - unlike non-asyncio programs. and they might go unnoticed. So we need to account for that.</p>
<p>I personally like using <code>asyncio.gather</code> because the order of the returned results are deterministic, but it’s easy to get tripped up with it. By default, it will swallow exceptions but happily continue working on the other tasks that were given. If an exception is never returned, weird behavior can happen, like spinning around an event.</p>
<hr>
<p>Follow the next part of this series for <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">working with synchronous and threaded code</a>, and testing <code>asyncio</code> code (coming soon!).</p>
Graceful Shutdowns with asynciohttp://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/2018-07-26T13:43:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><em>Foreword: This <strong>part 2</strong> of a 5-part series titled</em> “<code>asyncio</code>: We Did It Wrong.” <em>Take a look at <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">Part 1: True Concurrency</a> for where we are in the tutorial now. Once done, follow along with <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/">Part 3: Exception Handling</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">Part 4: Working with Synchronous & Threaded Code</a>, and Part 5: Testing <code>asyncio</code> Code (coming soon!).</em></p>
<p><em>Example code can be found on <a href="https://github.com/econchick/mayhem">GitHub</a>. All code on this post is licensed under <a href="https://github.com/econchick/mayhem/blob/master/LICENSE">MIT</a>.</em></p>
<hr>
<h3 id="mayhem-mandrill-recap">Mayhem Mandrill Recap</h3>
<p>The goal for this 5-part series is to build a mock chaos monkey-like service called “Mayhem Mandrill”. This is an event-driven service that consumes from a pub/sub, and initiates a mock restart of a host. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.</p>
<p>At the end of <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">part 1</a>, our service looked like this:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import asyncio
import functools
import logging
import random
import string
import uuid
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostnam e = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
async def publish(queue):
choices = string.ascii_lowercase + string.digits
while True:
msg_id = str(uuid.uuid4())
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
await queue.put(msg)
logging.debug(f'Published message {msg}')
# simulate randomness of publishing messages
await asyncio.sleep(random.random())
async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.randrange(1,3))
logging.info(f'Restarted {msg.hostname}')
async def save(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Saved {msg} into database')
async def cleanup(msg):
# this will block the rest of the coro until `event.set` is called
await event.wait()
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Done. Acked {msg}')
async def extend(msg, event):
while not event.is_set():
logging.info(f'Extended deadline by 3 seconds for {msg}')
# want to sleep for less than the deadline amount
await asyncio.sleep(2)
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
await asyncio.gather(save(msg), restart_host(msg))
event.set()
async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(handle_message(msg))
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
if __name__ == '__main__':
queue = asyncio.Queue()
loop = asyncio.get_event_loop()
publisher_coro = handle_exception(publish(queue), loop)
consumer_coro = handle_exception(consume(queue), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
except KeyboardInterrupt:
logging.info('Interrupted')
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre><h3 id="graceful-shutdown">Graceful shutdown</h3>
<p>Often, you’ll want your service to gracefully shutdown if it receives a POSIX signal of some sort, e.g. clean up open database connections, stop consuming messages, finish responding to current requests while not accepting new requests, etc. So, if we happen to restart an instance of our <em>own</em> service, we should clean up the “mess” we’ve made before exiting out.</p>
<p>We’ve been catching the commonly-known <code>KeyboardInterrupt</code> exception like many other tutorials and libraries. But there are many common signals that a service should expect and handled. A few typical ones are (descriptions from <a href="http://man7.org/linux/man-pages/man7/signal.7.html"><code>man signal</code></a>):</p>
<ul>
<li><code>SIGHUP</code> - Hangup detected on controlling terminal or death of controlling process</li>
<li><code>SIGQUIT</code> - Quit from keyboard (via <code>^\</code>)</li>
<li><code>SIGTERM</code> - Termination signal</li>
<li><code>SIGINT</code> - Interrupt program</li>
</ul>
<p>There’s also <code>SIGKILL</code> (i.e. the familiar <code>kill -9</code>) and <code>SIGSTOP</code>, although the standard is that they can’t be caught, blocked, or ignored.</p>
<p>Currently, if we quit our service via <code>^\</code> or send a signal via something like <code>pkill -TERM -f <script path></code>, our service doesn’t get a chance to clean up:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_13.py
19:08:25,553 INFO: Pulled PubSubMessage(instance_name='cattle-npww')
19:08:25,554 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-npww')
19:08:25,655 INFO: Pulled PubSubMessage(instance_name='cattle-rm7n')
19:08:25,655 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-rm7n')
19:08:25,790 INFO: Saved PubSubMessage(instance_name='cattle-rm7n') into database
19:08:25,831 INFO: Saved PubSubMessage(instance_name='cattle-npww') into database
[1] 78851 terminated python mandrill/mayhem_13.py
</code></pre>
<p>We see that we don’t reach the <code>finally</code> clause.</p>
<h3 id="using-a-signal-handler">Using a Signal Handler</h3>
<p>It should also be pointed out that – even <em>if</em> we were to only ever expect a <code>KeyboardInterrupt</code> / <code>SIGINT</code> signal – it could happen outside the catching of the exception, potentially causing the service to end up in an incomplete or otherwise unknown state:</p>
<pre><code data-lang="py3">if __name__ == '__main__':
queue = asyncio.Queue()
publisher_coro = handle_exception(publish(queue))
consumer_coro = handle_exception(consume(queue))
loop = asyncio.get_event_loop() # <-- could happen here or earlier
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
except Exception:
logging.error('Caught exception') # <-- could happen here
except KeyboardInterrupt:
logging.info('Process interrupted') # <-- could happen here
finally:
logging.info('Cleaning up') # <-- could happen here
loop.stop() # <-- could happen here
</code></pre>
<p>So, instead of catching <code>KeyboardInterrupt</code>, let’s attach some signal handlers to the loop. First, we should define the shutdown behavior we want when a signal is caught:</p>
<pre><code data-lang="py3">async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
logging.info('Closing database connections')
logging.info('Nacking outstanding messages')
tasks = [t for t in asyncio.all_tasks() if t is not
asyncio.current_task()]
[task.cancel() for task in tasks]
logging.info('Canceling outstanding tasks')
await asyncio.gather(*tasks)
loop.stop()
logging.info('Shutdown complete.')
</code></pre>
<p>Here I’m just closing that simulated database connections, returning messages to pub/sub as not acknowledged (so they can be redelivered and not dropped), and finally canceling the tasks. We don’t necessarily need to cancel pending tasks; we could just collect and allow them to finish. We may also want to take this opportunity to flush any collected metrics so they’re not lost.</p>
<p>Let’s hook this up to the main event loop now. We can also remove the <code>KeyboardInterrupt</code> catch since that’s now taken care of with adding <code>signal.SIGINT</code> as a handled signal.</p>
<pre><code data-lang="py3"># <-- snip -->
import signal
# <-- snip -->
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# May want to catch other signals too
signals = (signal.SIGHUP, signal.SIGTERM, signal.SIGINT)
for s in signals:
loop.add_signal_handler(
s, lambda s=s: asyncio.create_task(shutdown(s, loop)))
queue = asyncio.Queue()
publisher_coro = handle_exception(publish(queue), loop)
consumer_coro = handle_exception(consume(queue), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>You might have noticed that within the <code>lambda</code> closure, I binded the <code>s</code> immediately. This is because without that, we end up running into an apparently common gotcha in Python-land: <a href="https://docs.python-guide.org/writing/gotchas/#late-binding-closures">late bindings</a>. </p>
<p>So now when I run the script, and in another terminal, run <code>pkill -TERM -f "python mandrill/mayhem_14.py"</code>, (or <code>-HUP</code> or <code>-INT</code>), we see the following:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_14.py
19:11:25,321 INFO: Pulled PubSubMessage(instance_name='cattle-lrnm')
19:11:25,321 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-lrnm')
19:11:25,700 INFO: Pulled PubSubMessage(instance_name='cattle-m0f6')
19:11:25,700 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-m0f6')
19:11:25,740 INFO: Saved PubSubMessage(instance_name='cattle-m0f6') into database
19:11:25,840 INFO: Saved PubSubMessage(instance_name='cattle-lrnm') into database
19:11:26,143 INFO: Received exit signal SIGTERM...
19:11:26,143 INFO: Closing database connections
19:11:26,144 INFO: Canceling outstanding tasks
19:11:26,144 ERROR: Caught exception
19:11:26,144 ERROR: Caught exception
19:11:26,144 INFO: Cleaning up
</code></pre>
<p>It looks like we hit <code>'Caught exception'</code> twice. This is because awaiting on canceled tasks will raise <code>asyncio.CancelledError</code>, which is to be expected. We can add that to <code>handle_exception</code> as well:</p>
<pre><code data-lang="py3">async def handle_exception(fn, loop):
try:
await fn()
except asyncio.CancelledError:
logging.info('Coroutine canceled')
except Exception :
logging.error('Caught exception')
finally:
loop.stop()
</code></pre>
<p>Smoother sailing now:</p>
<pre><code data-lang="console">$ python mandrill/mayhem_14.py
19:22:10,47 INFO: Pulled PubSubMessage(instance_name='cattle-1zsx')
19:22:10,47 INFO: Extended deadline by 3 seconds for PubSubMessage(instance_name='cattle-1zsx')
^C19:22:10,541 INFO: Received exit signal SIGINT...
19:22:10,541 INFO: Closing database connections
19:22:10,541 INFO: Canceling outstanding tasks
19:22:10,541 INFO: Coroutine canceled
19:22:10,541 INFO: Coroutine canceled
19:22:10,541 INFO: Cleaning up
</code></pre>
<p>We now see our coroutines are canceled and not some random exception.</p>
<h3 id="which-signals-to-care-about">Which signals to care about</h3>
<p>You might be asking which signals should you care about. And apparently, there is no standard.</p>
<table class='table table-striped'> <thead> <tr> <th></th> <th>Hard Exit</th> <th>Graceful</th> <th>Reload/Restart</th> </tr> </thead> <tbody> <tr> <th scope='row'><a href="http://nginx.org/en/docs/control.html">nginx</a></th> <td><code>TERM</code>, <code>INT</code></td> <td><code>QUIT</code></td> <td><code>HUP</code></td> </tr> <tr> <th scope='row'><a href="http://httpd.apache.org/docs/2.2/stopping.html">Apache</a></th> <td><code>TERM</code></td> <td><code>WINCH</code></td> <td><code>HUP</code></td> </tr> <tr> <th scope='row'><a href="https://uwsgi-docs.readthedocs.io/en/latest/Management.html#signals-for-controlling-uwsgi">uWSGI</a></th> <td><code>INT</code>, <code>QUIT</code></td> <td></td> <td><code>HUP</code>, <code>TERM</code></td> </tr> <tr><th scope='row'><a href="http://docs.gunicorn.org/en/stable/signals.html">Gunicorn</a></th> <td><code>INT</code>, <code>QUIT</code></td> <td><code>TERM</code></td> <td><code>HUP</code></td> </tr> <tr><th scope='row'><a href="https://docs.docker.com/engine/reference/commandline/kill/">Docker</a></th> <td><code>KILL</code></td> <td><code>TERM</code></td> <td></td> </tr> </tbody> </table>
<p>With Docker, for further understanding of how to properly handle signals, I highly recommend taking a read of <a href="https://hynek.me/articles/docker-signals/">Why Your Dockerized Application Isn’t Receiving Signals</a>.</p>
<h4 id="heads-up-asyncio.shield-isn39t-graceful">Heads up: <code>asyncio.shield</code> isn’t graceful</h4>
<p>As I discovered when reading about this handy little library, <a href="https://github.com/cjrh/aiorun#%EF%B8%8F-smart-shield-for-shutdown"><code>aiorun</code></a>, another misleading API is <code>asyncio.shield</code>. The <a href="https://docs.python.org/3/library/asyncio-task.html#asyncio.shield">docs</a> say it’s a means to shield a future from cancellation. But if you have a coroutine that must not be canceled during shutdown, <code>asyncio.shield</code> will not help you.</p>
<p>This is because the task that <code>asyncio.shield</code> creates gets included in <code>asyncio.all_tasks</code>, and therefore receives the cancellation signal like the rest of them.</p>
<p>To help illustrate, here’s a simple async function with a long sleep that we want to shield from cancellation:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
import asyncio
import logging
import signal
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
async def cant_stop_me():
logging.info('Hold on...')
await asyncio.sleep(60)
logging.info('Done!')
async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
tasks = [t for t in asyncio.all_tasks() if t is not
asyncio.current_task()]
[task.cancel() for task in tasks]
logging.info('Canceling outstanding tasks')
await asyncio.gather(*tasks)
logging.info('Outstanding tasks canceled')
loop.stop()
logging.info('Shutdown complete.')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
signals = (signal.SIGHUP, signal.SIGTERM, signal.SIGINT)
for s in signals:
loop.add_signal_handler(
s, lambda s=s: asyncio.create_task(shutdown(s, loop)))
shielded_coro = asyncio.shield(cant_stop_me())
try:
loop.run_until_complete(shielded_coro)
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>When running, the coroutine is immediately canceled. And while we see some logs related to the <code>shutdown</code> coroutine function and clean up, we don’t see <code>'Outstanding tasks canceled'</code> or <code>'Shutdown complete'</code>:</p>
<pre><code>14:09:16,461 INFO: Hold on...
^C14:09:17,349 INFO: Received exit signal SIGINT...
14:09:17,349 INFO: Canceling outstanding tasks
14:09:17,349 INFO: Cleaning up
Traceback (most recent call last):
File "shield_test_2.py", line 48, in <module>
loop.run_until_complete(shielded_coro)
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete
return future.result()
concurrent.futures._base.CancelledError
</code></pre><h3 id="recap">Recap</h3>
<p>Unfortunately, we don’t have any <a href="https://trio.readthedocs.io/en/latest/">nurseries</a> in <code>asyncio</code> core to clean ourselves up; it’s up to us to be responsible and close up the connections and files we opened, respond to outstanding requests, basically leave things how we found them. </p>
<p>Doing our cleanup in a <code>finally</code> clause isn’t enough, though, since a signal could be sent outside of the <code>try</code>/<code>except</code> clause.</p>
<p>So as we construct the loop, we should tell how it should be deconstructed as soon as possible in the program. This ensures that “all our bases are covered”, that we’re not leaving artifacts anywhere.</p>
<p>And finally, we also need to be aware of when our program should shutdown, which is closely tied to how we run our program. If it’s a manually-ran script, then <code>SIGINT</code> is fine. But if it’s within a daemonized Docker container, then <code>SIGTERM</code> is more appropriate.</p>
<hr>
<p>Follow the next part of this series for <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/">exception handling</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">working with synchronous and threaded code</a>, and testing <code>asyncio</code> code (coming soon!).</p>
asyncio: We Did It Wronghttp://www.roguelynn.com/words/asyncio-we-did-it-wrong/2018-07-26T13:43:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This post is an accompaniment to my <a href="https://ep2018.europython.eu/conference/talks/asyncio-in-practice-we-did-it-wrong"><code>asyncio</code> in Practice: We Did It Wrong</a> at <a href="https://ep2018.europython.eu">EuroPython</a> in Edinburgh, Scotland in July 2018, and for <a href="https://piterpy.com/en">PiterPy</a> in St. Petersburg, Russia in November 2018.</p>
<p>Slides made from Jupyter notebook can be found <a href="https://asyncio-wrong.herokuapp.com">here</a>, and all the code with the examples and the notebook itself can be found on <a href="https://github.com/econchick/mayhem">GitHub</a>.</p>
<hr>
<p><code>asyncio</code>. “<a href="https://medium.com/python-pandemonium/asyncio-coroutine-patterns-beyond-await-a6121486656f">The concurrent Python programmer’s dream</a>”, the answer to everyone’s asynchronous prayers. The <code>asyncio</code> module has various layers of abstraction allowing developers as much control as they need and are comfortable with. </p>
<p>Simple “Hello, World”-like examples show how it can be so effortless; look at that!</p>
<pre><code data-lang="pycon">Python 3.7.0 (default, Jul 6 2018, 11:30:06)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import asyncio, datetime
>>> async def hello():
... print(f'[{datetime.datetime.now()}] Hello...')
... await asyncio.sleep(1) # some I/O-intensive work
... print(f'[{datetime.datetime.now()}] ...World!')
...
>>> asyncio.run(hello())
[2018-07-07 10:45:55.559856] Hello...
[2018-07-07 10:45:56.568737] ...World!
</code></pre>
<p>But it’s easy to get lulled into a false sense of security. This ain’t helpful. We’re led to believe that we’re able to do a lot with the structured <code>async</code>/<code>await</code> API layer. Some tutorials, while great for the developer getting their toes wet, try to illustrate <a href="https://medium.com/archsaber/a-simple-introduction-to-pythons-asyncio-595d9c9ecf8c#f57b">real</a> <a href="http://markuseliasson.se/article/introduction-to-asyncio/">world</a> <a href="https://www.blog.pythonlibrary.org/2016/07/26/python-3-an-intro-to-asyncio/">examples</a>, but are actually just beefed-up “hello, world"s. Some <a href="https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html">even</a> <a href="http://stackabuse.com/python-async-await-tutorial/">misuse</a> <a href="https://pymotw.com/3/asyncio/futures.html">parts</a> of <code>asyncio</code>’s interface, allowing one to <a href="https://www.transceptor.technology/single-post/2016/08/19/Part-1-Avoiding-callback-hell-in-Python">easily fall</a> into the depths of <a href="http://callbackhell.com/">callback hell</a>. Some get you easily up and running with <code>asyncio</code>, but then you may not realize it’s not correct or exactly what you want, or only gets you part of the way there. While <a href="https://medium.com/python-pandemonium/asyncio-coroutine-patterns-beyond-await-a6121486656f">there are tutorials</a> that do to improve upon the basic Hello, World "use” case, often times, it doesn’t go far enough. It’s often <em>still</em> just a web crawler. I’m not sure about others, but I’m not building web crawlers at Spotify.</p>
<p>It’s not the fault of anyone though; asynchronous programming is difficult. Whether you use <code>asyncio</code>, Twisted, Tornado, or Golang, Erlang, Haskell, whatever, it’s just difficult. I myself have even fallen into this false sense of ease that the <code>asyncio</code> community builds where once I import it, <a href="https://imgs.xkcd.com/comics/python.png">everything will just fall into place as they should</a>. I do believe <code>asyncio</code> is quite user-friendly, but I did underestimate the inherit complexity concurrent programming brings.</p>
<p>The past couple of services we built at Spotify were perfect use cases for <code>asyncio</code>: a chaos-monkey-like service for restarting instances in Google Cloud, and an <a href="https://github.com/spotify/gordon">event-driven host name generation service for DNS</a>. Sure, we needed to make a lot of HTTP requests that should be non-blocking much like web crawlers. But these services also had to react to messages from a pub/sub, measure the progress of actions initiated from those messages, handle any incomplete actions or other external errors, take care of pub/sub message lease management, measure <a href="https://en.wikipedia.org/wiki/Service_level_indicator">SLIs</a>, and send metrics. And we needed to use non-<code>asyncio</code>-friendly dependencies as well. This quickly got difficult.</p>
<h2 id="the-tutorial-mayhem-mandrill">The Tutorial: Mayhem Mandrill</h2>
<p>So having lived through that and survived, allow me to provide you a real-world example that actually comes from the real world. As I mentioned, one of the <code>asyncio</code> services we built is similar to a <a href="https://github.com/Netflix/chaosmonkey">chaos monkey</a> to do periodic hard restarts of our entire fleet of instances. We’ll build a simplified version, dubbing it “Mayhem Mandrill” which will listen for a pub/sub message as a trigger to go ahead and restart a host based off of that message. As we build this service, I’ll point out potential traps to avoid. This will essentially become the type of resource that past Lynn would have wanted a year or two ago. </p>
<h3 id="tutorial-contents">Tutorial Contents</h3>
<p>Part 1: <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">True Concurrency</a><br/>
Part 2: <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">Graceful Shutdowns</a><br/>
Part 3: <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-3/">Exception Handling</a><br/>
Part 4: <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">Working with Synchronous & Threaded Code</a><br/>
Part 5: Testing <code>asyncio</code> Code - coming soon</p>
Synchronous & threaded code in asynciohttp://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/2018-07-26T13:43:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><em>Foreword: This is <strong>part 4</strong> of a 5-part series titled</em> “<code>asyncio</code>: We Did It Wrong.” <em>Take a look at <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-1/">Part 1: True Concurrency</a>, <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-2/">Part 2: Graceful Shutdowns</a>, and <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">Part 3: Exception Handling</a> for where we are in the tutorial now. Once done, follow along with Part 5: Testing <code>asyncio</code> Code (coming soon!).</em></p>
<p><em>Example code can be found on <a href="https://github.com/econchick/mayhem">GitHub</a>. All code on this post is licensed under <a href="https://github.com/econchick/mayhem/blob/master/LICENSE">MIT</a>.</em></p>
<hr>
<h3 id="mayhem-mandrill-recap">Mayhem Mandrill Recap</h3>
<p>The goal for this 5-part series is to build a mock chaos monkey-like service called “Mayhem Mandrill”. This is an event-driven service that consumes from a pub/sub, and initiates a mock restart of a host. We could get thousands of messages in seconds, so as we get a message, we shouldn’t block the handling of the next message we receive.</p>
<p>At the end of <a href="http://www.roguelynn.com/words/asyncio-we-did-it-wrong-pt-4/">part 3</a>, our service looked like this:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3.7
"""
Notice! This requires:
- attrs==18.1.0
"""
import asyncio
import functools
import logging
import random
import signal
import string
import uuid
import attr
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostnam e = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
async def publish(queue):
choices = string.ascii_lowercase + string.digits
while True:
msg_id = str(uuid.uuid4())
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
await queue.put(msg)
logging.debug(f'Published message {msg}')
# simulate randomness of publishing messages
await asyncio.sleep(random.random())
async def restart_host(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.randrange(1,3))
logging.info(f'Restarted {msg.hostname}')
async def save(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Saved {msg} into database')
async def cleanup(msg):
# this will block the rest of the coro until `event.set` is called
await event.wait()
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Done. Acked {msg}')
async def extend(msg, event):
while not event.is_set():
logging.info(f'Extended deadline by 3 seconds for {msg}')
# want to sleep for less than the deadline amount
await asyncio.sleep(2)
def handle_results(results):
for result in results:
if isinstance(result, Exception):
logging.error(f'Caught exception: {result}')
async def handle_message(msg):
event = asyncio.Event()
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
results = await asyncio.gather(
save(msg), restart_host(msg), return_exceptions=True)
handle_results(results)
event.set()
async def consume(queue):
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(handle_message(msg))
async def handle_exception(coro, loop):
try:
await coro
except Exception:
logging.error('Caught exception')
loop.stop()
async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
logging.info('Closing database connections')
logging.info('Nacking outstanding messages')
tasks = [t for t in asyncio.all_tasks() if t is not
asyncio.current_task()]
[task.cancel() for task in tasks]
logging.info('Canceling outstanding tasks')
await asyncio.gather(*tasks)
loop.stop()
logging.info('Shutdown complete.')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# May want to catch other signals too
signals = (signal.SIGHUP, signal.SIGTERM, signal.SIGINT)
for s in signals:
loop.add_signal_handler(
s, lambda s=s: asyncio.create_task(shutdown(s, loop)))
queue = asyncio.Queue()
publisher_coro = handle_exception(publish(queue), loop)
consumer_coro = handle_exception(consume(queue), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre><h3 id="making-synchronous-code-asyncio-friendly">Making synchronous code <code>asyncio</code>-friendly</h3>
<p>I’m sure that as folks have started to use <code>asyncio</code>, they’ve realized that <code>async</code>/<code>await</code> starts permeating everything around the code-base; everything needs to be async. This isn’t necessarily a bad thing; it just forces a shift in perspective.</p>
<p>Although sometimes, you’ll have a need to call synchronous code in your beautiful, asynchronous monster. To make it non-blocking, it may be as easy as using a threadpool executor:</p>
<pre><code data-lang="py3"># <-- snip -->
import concurrent.futures
import time
# <-- snip -->
def save_sync(msg):
# unhelpful simulation of blocking i/o work
time.sleep(random.random())
logging.info(f'[blocking] Saved {msg} into database')
# <-- snip -->
async def handle_message(msg, executor, loop):
event = asyncio.Event()
save_coro = loop.run_in_executor(executor, save_sync, msg)
restart_coro =
asyncio.create_task(extend(msg, event))
asyncio.create_task(cleanup(msg, event))
results = await asyncio.gather(
save_coro, restart_host(msg), return_exceptions=True
)
handle_results(results)
event.set()
async def consume(queue):
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
loop = asyncio.get_running_loop()
while True:
msg = await queue.get()
logging.info(f'Pulled {msg}')
asyncio.create_task(handle_message(msg, executor, loop))
</code></pre>
<p>But if you’re very lucky, you’ll need to use third-party code that blocks. To simulate this, I’ve made a synchronous consumer client to mimic a third-party blocking dependency. </p>
<p>This also requires a blocking publisher, too (reminder: this is a stand-in for some external pub/sub technology). I’m not going to focus on the publisher portion at all, but for transparency:</p>
<pre><code data-lang="py3"># <-- snip -->
import queue
# <-- snip -->
# sync publisher for help in simulating a blocking,
# third-party consumer client
def publish_sync(queue_sync):
msg_id = str(uuid.uuid4())
choices = string.ascii_lowercase + string.digits
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg = PubSubMessage(message_id=msg_id, instance_name=instance_name)
# put the item in the queue
queue_sync.put(msg)
logging.debug(f'Published message {msg}')
async def publish(executor, queue_sync):
loop = asyncio.get_running_loop()
while True:
await loop.run_in_executor(executor, publish_sync, queue)
await asyncio.sleep(0.5)
# simulates a blocking, third-party consumer client
def consume_sync(queue_sync):
try:
msg = queue_sync.get(block=False)
logging.info(f'Pulled {msg}')
return msg
except queue.Empty:
return
</code></pre>
<p>So for our code to work with this, we need to rework our asynchronous consumer and our <code>__main__</code> scope:</p>
<pre><code data-lang="py3">async def consume(executor, queue):
loop = asyncio.get_running_loop()
while True:
msg = await loop.run_in_executor(executor, consume_sync, queue)
if not msg: # could be None
continue
asyncio.create_task(handle_message(msg))
# <-- snip -->
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# May want to catch other signals too
signals = (signal.SIGHUP, signal.SIGTERM, signal.SIGINT)
for s in signals:
loop.add_signal_handler(
s, lambda s=s: asyncio.create_task(shutdown(s, loop)))
queue_sync = queue.Queue()
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
publisher_coro = handle_exception(publish(executor, queue_sync), loop)
consumer_coro = handle_exception(consume(executor, queue_sync), loop)
try:
loop.create_task(publisher_coro)
loop.create_task(consumer_coro)
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>Pretty easy actually; very similar to before with the <code>save_sync</code> example.</p>
<p><em>Aside:</em> There’s a handy little package called <a href="https://asyncio-extras.readthedocs.io/en/latest/">asyncio-extras</a> which provides a decorator for synchronous functions/methods. You can avoid the boilerplate of setting up an executor and just <code>await</code> the decorated function.</p>
<p>But sometimes, third-party code throws a wrench at you…</p>
<h3 id="making-threaded-code-asyncio-friendly">Making threaded code <code>asyncio</code>-friendly</h3>
<p>If you’re lucky, you’ll be faced with a third-party library that is multi-threaded <em>and</em> blocking. For example, Google Python library for its Pub/Sub makes use of gRPC under the hood which is implemented with threading, but <a href="https://googlecloudplatform.github.io/google-cloud-python/latest/pubsub/subscriber/index.html#subscription-callbacks">is also blocks</a> when we’re consuming from a publisher. The library also requires a non-asynchronous callback for when a message is received. To visualize, here’s a simple script which uses this library (if you’re running this yourself, be sure to use their local <a href="https://cloud.google.com/pubsub/docs/emulator">emulator</a>):</p>
<pre><code data-lang="py3">#!/usr/bin/env python3
"""
Notice! This requires: google-cloud-pubsub==0.35.4
(latest at the time of writing)
"""
import json
import logging
import os
import random
import string
from google.cloud import pubsub
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
TOPIC = 'projects/europython18/topics/ep18-topic'
SUBSCRIPTION = 'projects/europython18/subscriptions/ep18-sub'
PROJECT = 'europython18'
CHOICES = string.ascii_lowercase + string.digits
def get_publisher():
client = pubsub.PublisherClient()
try:
client.create_topic(TOPIC)
except Exception as e:
pass # already created
return client
def get_subscriber():
client = pubsub.SubscriberClient()
try:
client.create_subscription(SUBSCRIPTION, TOPIC)
except Exception:
pass # already created
return client
def publish_sync():
publisher = get_publisher_client()
for msg in range(1, 6):
msg_data = {'msg_id': ''.join(random.choices(CHOICES, k=4))}
bytes_message = bytes(json.dumps(msg_data), encoding='utf-8')
publisher.publish(TOPIC, bytes_message)
logging.debug(f'Published {msg_data["msg_id"]}')
def consume_sync():
client = get_subscriber_client()
def callback(msg):
msg.ack()
data = json.loads(msg.data.decode('utf-8'))
logging.info(f'Consumed {data["msg_id"]}')
future = client.subscribe(SUBSCRIPTION, callback)
try:
future.result() # blocking
except Exception as e:
logging.error(f'Caught exception: {e}')
if __name__ == '__main__':
# safety net, wouldn't want to do anything in prod
assert os.environ.get('PUBSUB_EMULATOR_HOST'), 'You should be running the emulator'
publish_sync()
consume_sync()
</code></pre>
<p>In particular, looking at <code>consume_sync</code>, the returned future is a <a href="https://googlecloudplatform.github.io/google-cloud-python/latest/pubsub/subscriber/api/futures.html#google.cloud.pubsub_v1.subscriber.futures.StreamingPullFuture"><code>StreamingPullFuture</code></a>. It’s pretty handy: it will asynchronously pull for messages from the publisher, allowing us to forgo the <code>while True</code> loop to periodically pull ourselves. The <code>StreamingPullFuture</code> also <a href="https://cloud.google.com/pubsub/docs/pull#streamingpull">makes use of some convenient features</a>, including managing the message deadlines.</p>
<p>To illustrate, here’s how we can use <code>loop.run_in_executor</code> for this blocking code. I’ve made a helper coroutine function (<code>run_pubsub</code>) to setup an executor, use it to kick off the synchronous consumer, and pass it off to my async publisher to use for its non-async work:</p>
<pre><code data-lang="py3"># <-- snip -->
import asyncio
import concurrent.futures
import signal
# <-- snip -->
# updated func to take in the loop as an argument
async def publish(executor, loop):
publisher = get_publisher()
while True:
await loop.run_in_executor(executor, publish_sync, publisher)
await asyncio.sleep(.1)
def callback(msg):
msg.ack()
data = json.loads(msg.data.decode('utf-8'))
logging.info(f'Consumed {data["msg_id"]}')
def consume_sync():
client = get_subscriber()
# remove the try/except around the returned future for now
client.subscribe(SUBSCRIPTION, callback)
async def run_pubsub():
loop = asyncio.get_running_loop()
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
consume_coro = loop.run_in_executor(executor, consume_sync)
asyncio.ensure_future(consume_coro)
loop.create_task(publish(executor, loop))
async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
loop.stop()
logging.info('Shutdown complete.')
if __name__ == '__main__':
# safety net, wouldn't want to do anything in prod
assert os.environ.get('PUBSUB_EMULATOR_HOST'), 'You should be running the emulator'
loop = asyncio.get_event_loop()
# one signal for simplicity
loop.add_signal_handler(
signal.SIGINT,
lambda: asyncio.create_task(shutdown(signal.SIGINT, loop))
)
try:
loop.create_task(run_pubsub())
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>I’d like to also prove that this is now non-blocking, so let’s add a dummy coroutine function, <code>run_something_else</code>, to be ran alongside <code>run_pubsub</code>. We’ll add two coroutine functions to a general <code>run</code>, helper, and update the <code>__main__</code> section:</p>
<pre><code data-lang="py3"># snip
async def run_something_else():
while True:
logging.info('Running something else')
await asyncio.sleep(random.random())
async def run():
coros = [run_pubsub(), run_something_else()]
await asyncio.gather(*coros)
if __name__ == '__main__':
assert os.environ.get('PUBSUB_EMULATOR_HOST'), 'You should be running the emulator'
loop = asyncio.get_event_loop()
# for simplicity
loop.add_signal_handler(
signal.SIGINT,
lambda: asyncio.create_task(shutdown(signal.SIGINT, loop))
)
try:
loop.create_task(run())
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<p>Now running it will show:</p>
<pre><code data-lang="console">$ python examples/mandrill/mayhem_19.py
15:18:27,722 INFO: Running something else
15:18:27,842 INFO: Consumed 3y41
15:18:27,842 INFO: Consumed bt72
15:18:27,843 INFO: Consumed txea
15:18:27,844 INFO: Consumed qmk2
15:18:27,845 INFO: Consumed 1zjo
15:18:28,108 INFO: Consumed 3dz6
15:18:28,109 INFO: Consumed zca8
15:18:28,109 INFO: Consumed 7yaz
15:18:28,110 INFO: Consumed e7rt
15:18:28,110 INFO: Consumed jgla
15:18:28,371 INFO: Consumed 4ucy
15:18:28,371 INFO: Consumed zev4
15:18:28,371 INFO: Consumed rrme
15:18:28,371 INFO: Consumed fk0b
15:18:28,372 INFO: Consumed npws
15:18:28,582 INFO: Running something else
^C15:18:28,825 INFO: Received exit signal SIGINT...
15:18:28,825 INFO: Shutdown complete.
15:18:28,826 INFO: Cleaning up
</code></pre>
<p>As I forewarned: although it will handle the message leasing for us, there are threads going on in the background. But, it introduces at least 15 threads… </p>
<p>Let’s update that <code>run_something_else</code> coroutine to be a little thread watcher in order to see all what’s going on:</p>
<pre><code data-lang="py3"># snip
import threading
# snip
async def watch_threads():
while True:
threads = threading.enumerate()
logging.info(f'Current thread count: {len(threads)}')
logging.info('Current threads:')
for thread in threads:
logging.info(f'-- {thread.name}')
logging.info('Sleeping for 5 seconds...')
await asyncio.sleep(5)
async def run():
coros = [run_pubsub(), watch_threads()]
await asyncio.gather(*coros)
</code></pre>
<p>Putting the consumer logging on debug level, we now have output looking like:</p>
<pre><code data-lang="console">$ python examples/mandrill/mayhem_20.py
15:31:14,711 INFO: Current thread count: 2
15:31:14,711 INFO: Current threads:
15:31:14,711 INFO: -- MainThread
15:31:14,711 INFO: -- ThreadPoolExecutor-0_0
15:31:14,711 INFO: Sleeping for 5 seconds...
15:31:19,715 INFO: Current thread count: 22
15:31:19,716 INFO: Current threads:
15:31:19,716 INFO: -- MainThread
15:31:19,716 INFO: -- ThreadPoolExecutor-0_0
15:31:19,716 INFO: -- ThreadPoolExecutor-0_1
15:31:19,716 INFO: -- Thread-CallbackRequestDispatcher
15:31:19,716 INFO: -- Thread-ConsumeBidirectionalStream
15:31:19,716 INFO: -- Thread-LeaseMaintainer
15:31:19,716 INFO: -- Thread-1
15:31:19,716 INFO: -- Thread-Heartbeater
15:31:19,717 INFO: -- Thread-2
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_0
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_1
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_2
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_3
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_4
15:31:19,717 INFO: -- ThreadPoolExecutor-0_2
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_5
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_6
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_7
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_8
15:31:19,717 INFO: -- ThreadPoolExecutor-ThreadScheduler_9
15:31:19,717 INFO: -- ThreadPoolExecutor-0_3
15:31:19,717 INFO: -- ThreadPoolExecutor-0_4
15:31:19,717 INFO: Sleeping for 5 seconds...
15:31:24,723 INFO: Current thread count: 22
^C15:31:25,273 INFO: Received exit signal SIGINT...
15:31:25,273 INFO: Shutdown complete.
15:31:25,273 INFO: Cleaning up
</code></pre>
<p>Ooph! Lots of threads. We can help ourselves out a little bit by making use of the <code>thread_name_prefix</code> argument in <code>ThreadPoolExecutor</code>:</p>
<pre><code data-lang="py3">async def run_pubsub():
loop = asyncio.get_running_loop()
executor = concurrent.futures.ThreadPoolExecutor(
max_workers=5, thread_name_prefix='Mandrill')
consume_coro = loop.run_in_executor(executor, consume_sync)
asyncio.ensure_future(consume_coro)
loop.create_task(publish(executor, loop))
</code></pre>
<p>Running it for a few seconds (snipped a bit to avoid the length):</p>
<pre><code data-lang="console"> python examples/mandrill/mayhem_20.py
15:16:34,537 INFO: Current thread count: 2
15:16:34,537 INFO: Current threads:
15:16:34,537 INFO: -- MainThread
15:16:34,538 INFO: -- Mandrill_0
15:16:34,538 INFO: Sleeping for 5 seconds...
15:16:39,542 INFO: Current thread count: 22
15:16:39,542 INFO: Current threads:
15:16:39,542 INFO: -- MainThread
15:16:39,543 INFO: -- Mandrill_0
15:16:39,543 INFO: -- Thread-CallbackRequestDispatcher
15:16:39,543 INFO: -- Mandrill_1
15:16:39,543 INFO: -- Thread-ConsumeBidirectionalStream
15:16:39,543 INFO: -- Thread-LeaseMaintainer
15:16:39,543 INFO: -- Thread-1
15:16:39,543 INFO: -- Thread-Heartbeater
15:16:39,543 INFO: -- Thread-2
15:16:39,543 INFO: -- ThreadPoolExecutor-ThreadScheduler_0
15:16:39,543 INFO: -- ThreadPoolExecutor-ThreadScheduler_1
15:16:39,543 INFO: -- ThreadPoolExecutor-ThreadScheduler_2
15:16:39,543 INFO: -- ThreadPoolExecutor-ThreadScheduler_3
15:16:39,543 INFO: -- ThreadPoolExecutor-ThreadScheduler_4
15:16:39,544 INFO: -- Mandrill_2
15:16:39,544 INFO: -- ThreadPoolExecutor-ThreadScheduler_5
15:16:39,544 INFO: -- ThreadPoolExecutor-ThreadScheduler_6
15:16:39,544 INFO: -- ThreadPoolExecutor-ThreadScheduler_7
15:16:39,544 INFO: -- ThreadPoolExecutor-ThreadScheduler_8
15:16:39,544 INFO: -- ThreadPoolExecutor-ThreadScheduler_9
15:16:39,544 INFO: -- Mandrill_3
15:16:39,544 INFO: -- Mandrill_4
15:16:39,544 INFO: Sleeping for 5 seconds...
15:16:44,546 INFO: Current thread count: 22
15:16:44,547 INFO: Current threads:
15:16:44,547 INFO: -- MainThread
15:16:44,547 INFO: -- Mandrill_0
15:16:44,547 INFO: -- Thread-CallbackRequestDispatcher
15:16:44,548 INFO: -- Mandrill_1
15:16:44,548 INFO: -- Thread-ConsumeBidirectionalStream
15:16:44,548 INFO: -- Thread-LeaseMaintainer
15:16:44,548 INFO: -- Thread-1
15:16:44,548 INFO: -- Thread-Heartbeater
15:16:44,548 INFO: -- Thread-2
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_0
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_1
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_2
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_3
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_4
15:16:44,548 INFO: -- Mandrill_2
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_5
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_6
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_7
15:16:44,548 INFO: -- ThreadPoolExecutor-ThreadScheduler_8
15:16:44,549 INFO: -- ThreadPoolExecutor-ThreadScheduler_9
15:16:44,549 INFO: -- Mandrill_3
15:16:44,549 INFO: -- Mandrill_4
15:16:44,549 INFO: Sleeping for 5 seconds...
^C15:16:46,821 INFO: Received exit signal SIGINT...
15:16:46,821 INFO: Shutdown complete.
15:16:46,821 INFO: Cleaning up
</code></pre>
<p>We see we have the <code>MainThread</code> which is the <code>asyncio</code> event loop. There’s also five <code>Mandrill_</code>-prefixed threads that were created by our threadpool executor. There’s five because we limited the number of workers when creating the executor. It looks as if the subscription client has its own threadpool executor named <code>ThreadPoolExecutor-ThreadScheduler</code>; <code>Thread-MonitorBatchPublisher</code> is from the publisher; and some gRPC/bidirectional streaming going on for consuming pub/sub with the rest of the threads (heart beater, lease maintainer, etc). </p>
<p>All in all, though, the approach to threaded code isn’t any different than the non-async code. </p>
<p>Until you release you need to call asynchronous code from a non-async function that’s within another thread.</p>
<h3>Making threaded code <code>asyncio</code>-<strike>friendly</strike> tolerable</h3>
<p>Obviously we can’t just <code>ack</code> a message once we receive it. We need to restart the required host and save the message in our database. </p>
<p>We’re unable to simple call <code>asyncio.create_task</code> in our <code>consume_sync</code> function:</p>
<pre><code data-lang="py3">def consume_sync():
client = get_subscriber()
def callback(msg):
data = json.loads(msg.data.decode('utf-8'))
logging.info(f'Consumed {data["msg_id"]}')
# can't do this!
asyncio.create_task(handle_message(data))
client.subscribe(SUBSCRIPTION, callback)
</code></pre>
<p>As it errors like so:</p>
<pre><code data-lang="console">16:45:36,709 INFO: Running something else
16:45:36,833 INFO: Consumed es7s
16:45:36,833 ERROR: Top-level exception occurred in callback while processing a message
Traceback (most recent call last):
File "/Users/lynn/.pyenv/versions/ep18-37/lib/python3.7/site-packages/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py", line 63, in _wrap_callback_errors
callback(message)
File "examples/mandrill/mayhem_21.py", line 115, in callback
asyncio.create_task(handle_message(data))
File "/Users/lynn/.pyenv/versions/3.7.0/lib/python3.7/asyncio/tasks.py", line 320, in create_task
loop = events.get_running_loop()
</code></pre>
<p>We <em>could</em> give it the currently-running loop to add tasks to via <code>loop.create_task</code>:</p>
<pre><code data-lang="py3">def consume_sync(loop):
client = get_subscriber()
def callback(pubsub_msg):
logging.info(f'Consumed {pubsub_msg.message_id}')
loop.create_task(handle_message(pubsub_msg))
client.subscribe(SUBSCRIPTION, callback)
async def run_pubsub():
loop = asyncio.get_running_loop()
executor = concurrent.futures.ThreadPoolExecutor(
max_workers=5, thread_name_prefix='Mandrill')
consume_coro = loop.run_in_executor(executor, consume_sync, loop)
asyncio.ensure_future(consume_coro)
loop.create_task(publish(executor, loop))
</code></pre>
<p>Running with this seems like it works: </p>
<pre><code data-lang="console">$ python examples/mandrill/mayhem_23.py
18:08:09,761 INFO: Running something else
18:08:09,826 INFO: Consumed 5236
18:08:09,826 INFO: Consumed 5237
18:08:09,827 INFO: Consumed 5238
18:08:09,827 INFO: Consumed 5239
18:08:09,828 INFO: Consumed 5240
18:08:10,543 INFO: Handling PubSubMessage(instance_name='xbci')
18:08:10,543 INFO: Handling PubSubMessage(instance_name='e8x5')
18:08:10,544 INFO: Handling PubSubMessage(instance_name='shti')
18:08:10,544 INFO: Handling PubSubMessage(instance_name='9yne')
18:08:10,544 INFO: Handling PubSubMessage(instance_name='qgor')
18:08:10,544 INFO: Running something else
18:08:10,601 INFO: Saved PubSubMessage(instance_name='shti') into database
18:08:10,721 INFO: Saved PubSubMessage(instance_name='e8x5') into database
18:08:10,828 INFO: Saved PubSubMessage(instance_name='xbci') into database
18:08:10,828 WARNING: Caught exception: Could not restart xbci.example.net
18:08:11,162 INFO: Saved PubSubMessage(instance_name='9yne') into database
18:08:11,167 INFO: Running something else
18:08:11,481 INFO: Saved PubSubMessage(instance_name='qgor') into database
18:08:11,549 INFO: Restarted e8x5.example.net
18:08:11,550 INFO: Restarted 9yne.example.net
18:08:11,550 INFO: Restarted qgor.example.net
18:08:11,674 INFO: Done. Acked 5240
18:08:11,821 INFO: Done. Acked 5236
18:08:12,108 INFO: Running something else
18:08:12,276 INFO: Done. Acked 5237
18:08:12,322 INFO: Running something else
18:08:12,510 INFO: Done. Acked 5239
18:08:12,549 INFO: Restarted shti.example.net
18:08:12,839 INFO: Running something else
18:08:12,841 INFO: Consumed 5241
18:08:12,842 INFO: Consumed 5242
18:08:12,842 INFO: Consumed 5243
18:08:12,843 INFO: Consumed 5244
18:08:12,843 INFO: Consumed 5245
18:08:13,153 INFO: Handling PubSubMessage(instance_name='udtv')
18:08:13,154 INFO: Handling PubSubMessage(instance_name='a75e')
18:08:13,154 INFO: Handling PubSubMessage(instance_name='rvxb')
18:08:13,154 INFO: Handling PubSubMessage(instance_name='ka9a')
18:08:13,154 INFO: Handling PubSubMessage(instance_name='o7f2')
18:08:13,155 INFO: Done. Acked 5238
18:08:13,322 INFO: Saved PubSubMessage(instance_name='rvxb') into database
18:08:13,477 INFO: Saved PubSubMessage(instance_name='ka9a') into database
18:08:13,478 WARNING: Caught exception: Could not restart ka9a.example.net
^C18:08:13,506 INFO: Received exit signal SIGINT...
18:08:13,506 INFO: Shutdown complete.
18:08:13,506 INFO: Cleaning up
</code></pre>
<p>This is deceptive. We’re lucky it works. Once we share some data between the threaded code in the callback and the asynchronous code when handling the message, we’ll see this only works because of happenstance.</p>
<p>To illustrate what I mean, let’s share a simple intermediary queue between the threaded code and the event loop, and try to cancel the task that <code>loop.create_task</code> returns.</p>
<pre><code data-lang="py3">GLOBAL_QUEUE = asyncio.Queue()
async def get_from_queue():
while True:
pubsub_msg = await GLOBAL_QUEUE.get()
logging.info(f'Got {pubsub_msg.message_id} from queue')
asyncio.create_task(handle_message(pubsub_msg))
async def add_to_queue(msg):
logging.info(f'Adding {msg.message_id} to queue')
await GLOBAL_QUEUE.put(msg)
def consume_sync(loop):
client = get_subscriber()
def callback(pubsub_msg):
logging.info(f'Consumed {pubsub_msg.message_id}')
task = loop.create_task(add_to_queue(pubsub_msg))
task.cancel() # attempt to cancel the task given to another thread
client.subscribe(SUBSCRIPTION, callback)
</code></pre>
<p>Running it, we see something funky:</p>
<pre><code data-lang="console">$ python examples/mandrill/mayhem_24.py
18:12:08,359 INFO: Consumed 5241
18:12:08,359 INFO: Consumed 5243
18:12:08,359 INFO: Consumed 5244
18:12:08,360 INFO: Consumed 5245
18:12:08,360 INFO: Consumed 5242
18:12:08,414 INFO: Consumed 5246
18:12:08,415 INFO: Consumed 5247
18:12:08,415 INFO: Consumed 5248
18:12:08,415 INFO: Consumed 5249
18:12:08,416 INFO: Consumed 5250
18:12:08,821 INFO: Adding 5241 to queue
18:12:08,821 INFO: Adding 5243 to queue
18:12:08,822 INFO: Adding 5244 to queue
18:12:08,822 INFO: Adding 5245 to queue
18:12:08,822 INFO: Adding 5242 to queue
18:12:08,822 INFO: Adding 5246 to queue
18:12:08,822 INFO: Adding 5247 to queue
18:12:08,822 INFO: Adding 5248 to queue
18:12:08,822 INFO: Adding 5249 to queue
18:12:08,822 INFO: Adding 5250 to queue
18:12:13,403 INFO: Consumed 5251
18:12:13,404 INFO: Consumed 5252
18:12:13,404 INFO: Consumed 5253
18:12:13,404 INFO: Consumed 5254
18:12:13,404 INFO: Consumed 5255
18:12:13,875 INFO: Adding 5251 to queue
18:12:13,876 INFO: Adding 5252 to queue
18:12:13,876 INFO: Adding 5253 to queue
18:12:13,876 INFO: Adding 5254 to queue
18:12:13,876 INFO: Adding 5255 to queue
^C18:12:14,896 INFO: Received exit signal SIGINT...
18:12:14,896 INFO: Shutdown complete.
18:12:14,896 INFO: Cleaning up
</code></pre>
<p>We don’t ever consume from our intermediary queue. If we add a line in our <code>add_to_queue</code> coroutine to see the queue size:</p>
<pre><code data-lang="py3">async def add_to_queue(msg):
logging.info(f'Adding {msg.message_id} to queue')
await GLOBAL_QUEUE.put(msg)
logging.info(f'Current queue size: {GLOBAL_QUEUE.qsize()}')
</code></pre>
<p>We can see that the queue is ever-growing and in fact we’re not reading from it:</p>
<pre><code data-lang="console">python examples/mandrill/mayhem_24.py
18:17:09,537 INFO: Adding 5271 to queue
18:17:09,537 INFO: Current queue size: 1
18:17:09,537 INFO: Adding 5272 to queue
18:17:09,537 INFO: Current queue size: 2
18:17:09,537 INFO: Adding 5273 to queue
18:17:09,537 INFO: Current queue size: 3
18:17:09,537 INFO: Adding 5274 to queue
18:17:09,537 INFO: Current queue size: 4
18:17:09,537 INFO: Adding 5275 to queue
18:17:09,537 INFO: Current queue size: 5
18:17:14,572 INFO: Adding 5276 to queue
18:17:14,572 INFO: Current queue size: 6
18:17:14,572 INFO: Adding 5277 to queue
18:17:14,572 INFO: Current queue size: 7
18:17:14,572 INFO: Adding 5278 to queue
18:17:14,572 INFO: Current queue size: 8
18:17:14,572 INFO: Adding 5279 to queue
18:17:14,572 INFO: Current queue size: 9
18:17:14,572 INFO: Adding 5280 to queue
18:17:14,572 INFO: Current queue size: 10
^C18:17:16,899 INFO: Received exit signal SIGINT...
18:17:16,899 INFO: Shutdown complete.
18:17:16,899 INFO: Cleaning up
</code></pre>
<p>Perhaps some of you already see what’s going on here: we’re not thread-safe. It even says it <a href="https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading">right in the docs</a>. <em>facepalm</em></p>
<p>Lucky for us: we can make use of <code>asyncio.run_coroutine_threadsafe</code>:</p>
<pre><code data-lang="py3">def consume_sync(loop):
client = get_subscriber()
def callback(pubsub_msg):
logging.info(f'Consumed {pubsub_msg.message_id}')
asyncio.run_coroutine_threadsafe(add_to_queue(pubsub_msg), loop)
client.subscribe(SUBSCRIPTION, callback)
</code></pre>
<p>Yes! This now works:</p>
<pre><code data-lang="console">$ python examples/mandrill/mayhem_25.py
20:46:59,144 INFO: Running something else
20:46:59,209 INFO: Consumed 6806
20:46:59,210 INFO: Consumed 6835
20:46:59,210 INFO: Adding 6806 to queue
20:46:59,210 INFO: Current queue size: 1
20:46:59,210 INFO: Adding 6835 to queue
20:46:59,210 INFO: Current queue size: 2
20:46:59,211 INFO: Got 6806 from queue
20:46:59,211 INFO: Got 6835 from queue
20:46:59,211 INFO: Consumed 6834
20:46:59,211 INFO: Handling PubSubMessage(instance_name='mbab')
20:46:59,212 INFO: Consumed 6823
20:46:59,212 INFO: Handling PubSubMessage(instance_name='tekn')
20:46:59,212 INFO: Consumed 6822
20:46:59,212 INFO: Adding 6834 to queue
20:46:59,213 INFO: Consumed 6825
20:46:59,213 INFO: Current queue size: 1
20:46:59,213 INFO: Consumed 6828
20:46:59,214 INFO: Adding 6823 to queue
20:46:59,214 INFO: Consumed 6829
20:46:59,214 INFO: Current queue size: 2
20:46:59,214 INFO: Consumed 6826
20:46:59,215 INFO: Got 6834 from queue
20:46:59,215 INFO: Got 6823 from queue
20:46:59,215 INFO: Adding 6822 to queue
20:46:59,215 INFO: Current queue size: 1
20:46:59,215 INFO: Handling PubSubMessage(instance_name='prgs')
20:46:59,216 INFO: Handling PubSubMessage(instance_name='ifoc')
20:46:59,216 INFO: Adding 6825 to queue
20:46:59,216 INFO: Current queue size: 2
20:46:59,216 INFO: Consumed 6832
20:46:59,216 INFO: Adding 6828 to queue
20:46:59,216 INFO: Consumed 6833
20:46:59,216 INFO: Consumed 6830
20:46:59,216 INFO: Current queue size: 3
20:46:59,216 INFO: Consumed 6831
20:46:59,217 INFO: Adding 6829 to queue
20:46:59,217 INFO: Current queue size: 4
20:46:59,217 INFO: Got 6822 from queue
20:46:59,217 INFO: Got 6825 from queue
20:46:59,226 INFO: Got 6828 from queue
20:46:59,226 INFO: Got 6829 from queue
20:46:59,226 INFO: Adding 6826 to queue
20:46:59,226 INFO: Current queue size: 1
20:46:59,226 INFO: Handling PubSubMessage(instance_name='cnv6')
20:46:59,227 INFO: Handling PubSubMessage(instance_name='ahj9')
20:46:59,227 INFO: Handling PubSubMessage(instance_name='cfrs')
20:46:59,227 INFO: Handling PubSubMessage(instance_name='u6nl')
20:46:59,227 INFO: Got 6826 from queue
20:46:59,227 INFO: Adding 6832 to queue
20:46:59,227 INFO: Current queue size: 1
20:46:59,227 INFO: Adding 6833 to queue
20:46:59,227 INFO: Current queue size: 2
20:46:59,227 INFO: Adding 6830 to queue
20:46:59,227 INFO: Current queue size: 3
20:46:59,227 INFO: Adding 6831 to queue
20:46:59,227 INFO: Current queue size: 4
20:46:59,227 INFO: Saved PubSubMessage(instance_name='tekn') into database
20:46:59,228 INFO: Handling PubSubMessage(instance_name='efec')
20:46:59,228 INFO: Got 6832 from queue
20:46:59,228 INFO: Got 6833 from queue
20:46:59,228 INFO: Got 6830 from queue
20:46:59,228 INFO: Got 6831 from queue
20:46:59,228 INFO: Handling PubSubMessage(instance_name='ibrp')
20:46:59,228 INFO: Handling PubSubMessage(instance_name='op3r')
20:46:59,228 INFO: Handling PubSubMessage(instance_name='oi9j')
20:46:59,229 INFO: Handling PubSubMessage(instance_name='dw58')
20:46:59,243 INFO: Saved PubSubMessage(instance_name='ahj9') into database
20:46:59,243 WARNING: Caught exception: Could not restart ahj9.example.net
20:46:59,244 INFO: Running something else
</code></pre>
<p>It may look like things are serially processed, but it’s just the streaming pull future that the Google Pub/Sub library returns (just take a look at the milliseconds!).</p>
<h3 id="recap">Recap</h3>
<p>It’s pretty simple to get around synchronous code using a <code>ThreadPoolExecutor</code> and <code>loop.run_in_executor</code>. However, one can easily get tripped up when needing to use threads with <code>asyncio</code>. With that, there are a few <code>_threadsafe</code> APIs within the <code>asyncio</code> library that it’s good to get familiar with.</p>
<p>Here’s what our code looks like now:</p>
<pre><code data-lang="py3">#!/usr/bin/env python3
"""
Notice! This requires:
- attrs==18.1.0
- google-cloud-pubsub==0.35.4
"""
import asyncio
import concurrent.futures
import json
import logging
import os
import random
import signal
import string
import attr
from google.cloud import pubsub
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s,%(msecs)d %(levelname)s: %(message)s',
datefmt='%H:%M:%S',
)
TOPIC = 'projects/europython18/topics/ep18-topic'
SUBSCRIPTION = 'projects/europython18/subscriptions/ep18-sub'
PROJECT = 'europython18'
CHOICES = string.ascii_lowercase + string.digits
GLOBAL_QUEUE = asyncio.Queue()
@attr.s
class PubSubMessage:
instance_name = attr.ib()
message_id = attr.ib(repr=False)
hostname = attr.ib(repr=False, init=False)
def __attrs_post_init__(self):
self.hostname = f'{self.instance_name}.example.net'
def get_publisher():
client = pubsub.PublisherClient()
try:
client.create_topic(TOPIC)
except Exception as e:
# already created
pass
return client
def get_subscriber():
client = pubsub.SubscriberClient()
try:
client.create_subscription(SUBSCRIPTION, TOPIC)
except Exception:
# already created
pass
return client
def publish_sync(publisher):
choices = string.ascii_lowercase + string.digits
host_id = ''.join(random.choices(choices, k=4))
instance_name = f'cattle-{host_id}'
msg_data = {'instance_name': instance_name}
bytes_message = bytes(json.dumps(msg_data), encoding='utf-8')
publisher.publish(TOPIC, bytes_message)
async def publish(executor, loop):
publisher = get_publisher()
while True:
await loop.run_in_executor(executor, publish_sync, publisher)
await asyncio.sleep(.1)
async def restart_host(msg):
# faked error
rand_int = random.randrange(1, 3)
if rand_int == 2:
raise Exception(f'Could not restart {msg.hostname}')
# unhelpful simulation of i/o work
await asyncio.sleep(random.randrange(1,3))
logging.info(f'Restarted {msg.hostname}')
async def save(msg):
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
logging.info(f'Saved {msg} into database')
async def cleanup(pubsub_msg, event):
# this will block the rest of the coro until `event.set` is called
await event.wait()
# unhelpful simulation of i/o work
await asyncio.sleep(random.random())
pubsub_msg.ack()
logging.info(f'Done. Acked {pubsub_msg.message_id}')
def handle_results(results):
for result in results:
if isinstance(result, Exception):
logging.warning(f'Caught exception: {result}')
async def handle_message(pubsub_msg):
msg_data = json.loads(pubsub_msg.data.decode('utf-8'))
msg = PubSubMessage(
message_id=pubsub_msg.message_id,
instance_name=msg_data['instance_name']
)
logging.info(f'Handling {msg}')
event = asyncio.Event()
asyncio.create_task(cleanup(pubsub_msg, event))
results = await asyncio.gather(
save(msg), restart_host(msg), return_exceptions=True
)
handle_results(results)
event.set()
def consume_sync(loop):
client = get_subscriber()
def callback(pubsub_msg):
logging.info(f'Consumed {pubsub_msg.message_id}')
asyncio.run_coroutine_threadsafe(handle_message(pubsub_msg), loop)
client.subscribe(SUBSCRIPTION, callback)
async def run_pubsub():
loop = asyncio.get_running_loop()
executor = concurrent.futures.ThreadPoolExecutor(
max_workers=5, thread_name_prefix='Mandrill')
consume_coro = loop.run_in_executor(executor, consume_sync, loop)
asyncio.ensure_future(consume_coro)
loop.create_task(publish(executor, loop))
async def shutdown(signal, loop):
logging.info(f'Received exit signal {signal.name}...')
loop.stop()
logging.info('Shutdown complete.')
if __name__ == '__main__':
assert os.environ.get('PUBSUB_EMULATOR_HOST'), 'You should be running the emulator'
loop = asyncio.get_event_loop()
# for simplicity
loop.add_signal_handler(
signal.SIGINT,
lambda: asyncio.create_task(shutdown(signal.SIGINT, loop))
)
try:
loop.create_task(run_pubsub())
loop.run_forever()
finally:
logging.info('Cleaning up')
loop.stop()
</code></pre>
<hr>
<p>Coming soon: the last part of this series on testing <code>asyncio</code> code!</p>
Tracing, Fast and Slowhttp://www.roguelynn.com/words/tracing-fast-and-slow/2017-07-11T14:10:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This post is an accompaniment to my <a href="https://us.pycon.org/2017/schedule/presentation/565/">Tracing, Fast and Slow talk</a> at <a href="https://us.pycon.org/2017/">PyCon</a> in Portland, OR in May 2017, and for <a href="%22https://ep2017.europython.eu/conference/talks/tracing-fast-and-slow-digging-into-improving-your-web-services-performance%22">EuroPython</a> in Rimini, Italy in July 2017. Slides <a href="https://speakerdeck.com/roguelynn/tracing-fast-and-slow-digging-into-and-improving-your-web-services-performance">here</a> and video from PyCon <a href="https://www.youtube.com/watch?v=lu0F-psmBzc">here</a>.</p>
<hr>
<p>If you’ve read the <a href="https://landing.google.com/sre/book.html">Site Reliability Engineering</a> book from O'Reilly (a.k.a the “Google SRE book”), the TL;DR of many chapters seem to be “use distributed tracing.” With the not-that-new trend of microservices, where you may or may not own all the services that a request flow might touch, it’s imperative to understand where your code fits into the grand scheme of things, and how it operates. </p>
<p>There are three main needs for tracing a system: performance debugging, capacity planning, and problem diagnosis, although it can help address many others. While this post will have a slight focus on performance, these techniques can certainly be applicable for other needs.</p>
<p><em>NB:</em> Before diving in, I want to make apparent that a lot of this is collated, paraphrased, and otherwise digested research from academic and white papers, including “<a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">So, you want to trace your distributed system</a>” from Carnegie Mellon, “<a href="http://web.eecs.umich.edu/%7Etwenisch/papers/osdi14.pdf">The Mystery Machine: End-to-end performance analysis of large-scale Internet services</a>” by Facebook + University of Michigan, and Google’s “<a href="https://research.google.com/pubs/pub36356.html">Dapper, a Large-Scale Distributed Systems Tracing Infrastructure</a>”, among others.</p>
<h2 id="tracing-overview">Tracing Overview</h2>
<p>In the simplest terms, a trace follows the complete workflow from the start of a transaction (or request) to its end, including the components it flows through. For a very simple web application, it’s pretty easy to understand the workflow of a request. But add some databases, separate the frontend from the backend, throw in some caching, have an external API call, all behind a load balancer, then it gets difficult to put together workflows of requests.</p>
<h3 id="machine-centric">Machine-centric</h3>
<p>Historically, we’ve been focused on machine-centric metrics, including system-level metrics like CPU, disk space, memory, as well as app-level metrics like requests per second, response latency, database writes, etc.<sup><a href="#footnotes" style="border-bottom: none;">1</a></sup> Following and understanding these metrics are quite important, but there is no view into a service’s dependencies or its dependents. It’s also not possible to get a view of a complete flow of a request, nor develop an understanding about how one’s service performs at scale.</p>
<h3 id="workflow-centric">Workflow-centric</h3>
<p>A workflow-centric approach allows us to understand relationships of components within an entire system. We can follow a request from beginning to end to understand bottlenecks, hone in on anomalistic paths, and figure out where we need to add more resources.<sup><a href="#footnotes" style="border-bottom: none;">1</a></sup></p>
<p><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/intro_tracing_flow.png" title="Simplified Distributed System Example" alt="Simplified Distributed System Example"/>
<figcaption>Over-simplified Distributed System Example, Lynn Root, <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></figcaption></p>
<p>Looking at this super simplified system, where we have a load balancer, a frontend, backend, a database, and maybe an external dependency to a third-party API, then add redundancy, it can get particularly confusing to follow a request. How do we debug a problem of a rare workflow? How do we know which component of this system is the bottleneck? Which function call is taking the longest? Is another app on my host causing distortion to machine-centric performance metrics (<a href="https://en.wikipedia.org/wiki/Cloud_computing_issues#Performance_interference_and_noisy_neighbors">noisy neighbors</a> – a growing concern as many move to the cloud).</p>
<p>With so many potential paths that a request can take, with the potential for issues at every node and every edge, this can be mind-numbingly difficult if we continue to be machine-centric. End-to-end tracing will allow us to get a better picture to address these concerns.</p>
<h3 id="why-trace">Why Trace?</h3>
<p>Real briefly – there are many reasons to trace a system. The one that inspired this post is performance analysis; this is trying to understand what happens at the 50th or 75th percentile, the “steady state” problems. This will help identify latencies, resource and capacity usages, and other performance issues. We are also able to answer questions like: “did this particular deploy of this service have an effect on latency of the whole system?”</p>
<p>Tracing can also clue us on in anomalistic request flows – the 99.9 percentile. The issues can still be related to performance, or help identify problems with “correctness” like component failures or timeouts. </p>
<p>There is also profiling – very similar to the first – but here we’re just interested in particular components or aspects of the system. We don’t necessarily care about the full workflow.</p>
<p>Tracing can also answer the question of what a particular component depends on, and what depends on it; particularly useful for complex systems. With dependents identified, we can then attribute particularly expensive work (e.g. component A adds significant workload with disk writes to component B), which can be helpful when attributing cost to teams/service owners/component owners (e.g. component A forces component B to spend more $$ in AWS/GCP/etc).<sup><a href="#footnotes" style="border-bottom: none;">2</a></sup></p>
<p>And finally, we’re able to create models of our systems that allow us to ask what-if questions, like “what would happen to component A if we did a disaster recovery test on component B?”</p>
<h2 id="approaches-to-tracing">Approaches to Tracing</h2>
<p><span></span></p>
<h3 id="manual">Manual</h3>
<p>There are simple things that can be added to a web service, especially one that does not have dependent/depending components that you don’t own/have access to. You won’t get any pretty visualizations or help with centralized collection beyond how you typically handle your logs, but it still can provide a lot of insight.</p>
<p>So this is an example flask route and a decorator:</p>
<pre><code data-lang="python"># app.py - create a unique request ID for every request
import uuid
from functools import wraps
from flask import Flask
app = Flask(__name__)
def request_id(f):
@wraps(f)
def decorated(*args, **kwargs):
req_id = uuid.uuid4()
return f(req_id, *args, **kwargs)
return decorated
@app.route("/")
@request_id
def index(req_id):
# log w/ ID for wherever you want to trace
# app logic
</code></pre>
<p>Here, you can simply add a UUID to each request received as a header, then log at particular points of interest, like the beginning and end of handling a request, and any other in-between component or function calls, and propagate those headers if you can. </p>
<p>If your app is behind an nginx installation that you’re able to manipulate, you can turn on its ability to stamp each request with an <code>X-Request-ID</code> header which can be used in your app:</p>
<pre><code data-lang="nginx"># /etc/nginx/sites-available/app
upstream appserver {
10.0.0.0:80;
}
server {
listen 80;
# Returns header to client - useful for depending services
add_header X-Request-ID $request_id;
location / {
proxy_pass http://appserver;
# Passes header to the app server
proxy_set_header X-Request-ID $request_id;
}
}
</code></pre><pre><code data-lang="python"># app.py - get X-Request-ID header from nginx
from functools import wraps
from flask import Flask
app = Flask(__name__)
def request_id(f):
@wraps(f)
def decorated(*args, **kwargs):
# get the request ID header passed through from nginx
req_id = request.headers.get("X-Request-ID")
return f(req_id, *args, **kwargs)
return decorated
@app.route("/")
@request_id
def index(req_id):
# log w/ ID for wherever you want to trace
# app logic
</code></pre>
<p>You can also very simply add that request ID to nginx’s logs:</p>
<pre><code data-lang="nginx"># /etc/nginx/sites-available/app - logging to access_trace.log file
log_format trace '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent" '
'"$http_x_forwarded_for" $request_id';
upstream appserver {
10.0.0.0:80;
}
server {
listen 80;
add_header X-Request-ID $request_id;
location / {
proxy_pass http://appserver;
proxy_set_header X-Request-ID $request_id;
# Log $request_id
access_log /var/log/nginx/access_trace.log trace;
}
}
</code></pre><h3 id="blackbox">Blackbox</h3>
<p>Blackbox tracing is tracing with no instrumentation across the components. It tries to infer workflows and relationships by correlating variables and timing within already-defined log messages. From there, relationship inference is done via statistical or regression analysis.</p>
<p>It’s easiest with centralized logging, and if there is a somewhat standardized schema to log messages that contain some sort of ID and timestamp.<sup><a href="#footnotes" style="border-bottom: none;">3</a></sup> It’s particularly useful if instrumenting an entire system is too cumbersome (e.g. too much coordination with engineers) or can’t otherwise instrument components you don’t own. As such, it’s quite portable and is very little-to-no overhead, but it does require a lot of data points in order to infer relationships. It also lacks accuracy with the absence of instrumenting components themselves, as well as the ability to attribute causality with asynchronous behavior and concurrency.</p>
<p>Facebook and the University of Michigan wrote a very readable <a href="http://web.eecs.umich.edu/%7Etwenisch/papers/osdi14.pdf">academic paper</a> on assessing end-to-end performance employing this method.</p>
<p>Another approach to blackbox tracing can be through network tapping with the use of <a href="https://en.wikipedia.org/wiki/SFlow">Sflow</a>, <a href="http://nfdump.sourceforge.net/">nfdump</a>, and <a href="https://www.honeynet.org/node/691">iptable packet data</a>. Which I’m sure the NSA very familiar with themselves <em>< cough ></em>.</p>
<h3 id="metadata-propagation">Metadata Propagation</h3>
<p>The final common type of tracing is through metadata propagation. It’s the approach that was made popular by Google’s research paper on <a href="https://research.google.com/pubs/pub36356.html">Dapper</a>.</p>
<p>Essentially, components are instrumented at particular trace points to follow causality between functions, components, and systems; or even with common RPC libraries that will automatically add metadata to each call. </p>
<p><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/metadata_propagation.png" title="Metadata Propagation" alt="Metadata Propagation"/></p>
<p><figcaption>Metadata Propagation, adapted from <a href="https://research.google.com/pubs/pub36356.html">Dapper, a Large-Scale Distributed Systems Tracing Infrastructure</a></figcaption></p>
<p>Metadata that is tracked includes a trace ID – which represents one single trace or work flow – and a span ID for every point in a particular trace (e.g. request sent from client, request received by server, server responds, etc.) plus a span’s start and end time. </p>
<p>This approach is best when the system itself is designed with tracing in mind (but who actually does that!?) and avoids the guesswork with inferring causal relationships. However, it can add a bit of overhead to response time and throughput, so the use of sampling traces limits the burden on the system and data point storage. Sampling as low as 0.01% requests is plenty to get an understanding of a system’s performance.<sup><a href="#footnotes" style="border-bottom: none;">4</a></sup></p>
<h2 id="tracing-at-scale">Tracing at Scale</h2>
<p>When starting to have many microservices, and scale out with more resources, there are a few points in mind when instrumenting your system, particularly with the metadata propagation approach:</p>
<ul>
<li><span class="underline">What relationships to track:</span> essentially how to follow a trace and what is considered part of the workflow.</li>
<li><span class="underline">How to track them:</span> constructing metadata to track causal relationships is particularly difficult; there are a few approaches, each with their own fortés and drawbacks.</li>
<li><span class="underline">How to sample and reduce overhead of tracking:</span> the approach one chooses in sampling is largely defined by what questions you’re trying to answer with your tracing; there may be a clear answer, but not without penalties.</li>
<li><span class="underline">What to visualize:</span> The visualizations needed will also be informed by what we’re trying to answer with tracing.</li>
</ul>
<h3 id="what-relationships-to-track">What relationships to track</h3>
<p>When looking within a request, we can take two points of views: either the submitter PoV, or the Trigger.<sup><a href="#footnotes" style="border-bottom: none;">5</a></sup></p>
<p><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/submitter_flow_pov.png" title="Submitter Flow Point of View" alt="Submitter Flow Point of View"/>
<figcaption>Submitter PoV, adapted from “<a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">So, you want to trace your distributed system</a>”, p8</figcaption></p>
<p>The submitter follows or focuses on one complete request, and doesn’t take into account if part of that request is caused by another request/action. </p>
<p>For instance, evicting cache that was actually triggered by request #2 is still attributed to request #1 since its data comes from #1.</p>
<p><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/trigger_flow_pov.png" title="Trigger Flow Point of View" alt="Trigger Flow Point of View"/>
<figcaption>Trigger PoV, adapted from “<a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">So, you want to trace your distributed system</a>”, p8</figcaption></p>
<p>The trigger PoV focuses on the trigger that initiates action. Where in the same example, request #2 evicts cache from request #1, and therefore the eviction is included in request #2’s trace.</p>
<p>Choosing which flow to follow depends on the answers you’re trying to find. For instance, it doesn’t matter which approach is chosen for performance profiling, but following trigger causality will help detect anomalies by showing critical paths.</p>
<h3 id="how-to-track-relationships">How to track relationships</h3>
<p><em>…or what essentially is needed in your metadata.</em></p>
<p>What this boils down to is that it can be difficult to reliably track causal relationships within a distributed system. The sheer nature of a distributed system implies issues with ordering events and traces that happen across many hosts. There may not be a global synchronous clock available, so care must be taken in deciding what goes into crafting the metadata that is threaded through an end-to-end trace.</p>
<h4 id="request-id">Request ID</h4>
<p>Using a random ID like UUID or the <code>X-Request-ID</code> header will identify causally-related activity. But then tracing implementations must then use an external clock to collate traces. </p>
<p>In the absence of a global synced clock, or to avoid issues such as clock skew, looking at network send and receive messages can then be used to construct causal relationships (you can’t exactly receive a message before its sent).<sup><a href="#footnotes" style="border-bottom: none;">6</a></sup></p>
<p>However, this approach lacks in resiliency as there is potential for data loss from external systems, or inability to add trace points in components owned by others.</p>
<h4 id="request-id-logical-clock">Request ID + Logical Clock</h4>
<p>Tracing systems can also add a timestamp derived from a local, <a href="https://en.wikipedia.org/wiki/Logical_clock">logical clock</a> to the workflow ID, where it isn’t exactly the local system’s timestamp, but either a counter or sort of a randomized timestamp that is paired with the trace message as it flows through components.<sup><a href="#footnotes" style="border-bottom: none;">6</a></sup> With this approach, we don’t need the tracing system to spend time on the ordering of traces it collects since its explicit in the clock data, but parallelization and concurrency can complicate the understanding of relationships.</p>
<h4 id="request-id-logical-clock-previous-trace-points">Request ID + Logical Clock + Previous Trace Points</h4>
<p>One can also add the previous trace points that have been executed within the metadata to understand all the forks and joins. It also allows the immediate availability of tracing data as soon as the workflow ends, as there is no need to spend time on trying to establish the ordering of causal relationships.<sup><a href="#footnotes" style="border-bottom: none;">7</a></sup> But, as you can imagine, the metadata will only grow in size as it follows a workflow, adding to the payload.</p>
<h4 id="trade-offs">Trade-offs</h4>
<p>The above three approaches have trade-offs: payload size vs. explicit relationships vs. resiliency to lost data vs. immediate availability.</p>
<p>If you really care about the payload of the request, then a simple unique ID is your go-to, but at the expense of needing to infer relationships. Adding a timestamp of sorts can help establish explicit causal relationships, but you’re still susceptible to potential ordering issues of traces if data is lost. You may add the previously-executed tracepoints to avoid data loss and understand the forks and joins of a trace, while gaining immediate availability since the causal relationships are already established. But then you suffer in payload size. And – to my knowledge – there’s also the fact that no open source tracing systems implement this.</p>
<h3 id="how-to-sample">How to sample</h3>
<p>End-to-end tracing will have an effect on runtime and storage overhead no matter what you choose. For instance, if Google were to trace all web searches, despite its intelligent tracing implementation – Dapper – it would impose a 1.5% throughput penalty and add 16% to response time.<sup><a href="#footnotes" style="border-bottom: none;">8</a></sup></p>
<p>I won’t go into detail, but there are essentially three basic approaches to sampling:</p>
<ul>
<li>Head-based: a random sampling decision is made at the start of a workflow, and then follow it all the way through completion.<sup><a href="#footnotes" style="border-bottom: none;">9</a></sup></li>
<li>Tail-based: the sampling decision is made at the end of a workflow, implying some caching going on. Tail-based sampling needs to be a bit more intelligent, but is particularly useful for tracing anomalistic behavior.<sup><a href="#footnotes" style="border-bottom: none;">10</a></sup></li>
<li>Unitary: the sampling decision is made at the trace point itself (and therefore prevents the construction of a full workflow).<sup><a href="#footnotes" style="border-bottom: none;">10</a></sup></li>
</ul>
<p>Head-based is the simplest and ideal for performance profiling, and both head based and unitary are most often seen in current tracing system implementations. I’m not sure if there’s an open source tracing system that implements tail-based sampling.</p>
<h3 id="what-to-visualize">What to visualize</h3>
<p>What visualization you choose to look at depends on what you’re trying to figure out.</p>
<h4 id="gantt-charts">Gantt Charts</h4>
<p><img class="img-responsive img-rounded" src="http://www.roguelynn.com/assets/images/tracing/gantt_chart.jpeg" title="Example of a Gantt Chart" alt="Example of a Gantt Chart"/>
<figcaption>Example of Gantt Chart, Lynn Root, <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></figcaption></p>
<p>Gantt charts are popular and definitely appealing, but only show requests from a single trace. You’ve definitely seen this type if you’ve looked at the network tab of your browser’s dev tools. Nearly all open source tracing tools provide this type of chart.</p>
<h4 id="request-flow-graph">Request Flow Graph</h4>
<p><img class="img-responsive img-rounded" src="http://www.roguelynn.com/assets/images/tracing/request_flow_chart.jpeg" title="Example of a Request Flow Chart" alt="Example of a Request Flow Chart"/>
<figcaption>Example of a Request Flow Chart, adapted from “<a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">So, you want to trace your distributed system</a>”, p15</figcaption></p>
<p>When trying to get a sense of where a system’s bottlenecks are, a request flow graph (a.k.a. directed-acyclic graph) will show workflows as they are executed, and – unlike Gantt charts – can aggregate information of multiple requests of the same workflow.</p>
<h4 id="context-calling-tree">Context Calling Tree</h4>
<p><img class="img-responsive img-rounded" src="http://www.roguelynn.com/assets/images/tracing/context_calling_tree.jpeg" title="Example of a Context Calling Tree" alt="Example of a Context Calling Tree"/>
<figcaption>Example of a Context Calling Tree, adapted from “<a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">So, you want to trace your distributed system</a>”, p15</figcaption></p>
<p>Another useful representation is a calling context tree in order to visualize multiple requests of different workflows. This reveals valid (and invalid) paths requests can take, best for creating a general understanding of system behavior.</p>
<h3 id="keep-in-mind">Keep in mind</h3>
<p>What the take away here is there’s a few things we need to consider when we trace a system:</p>
<h5 id="what-do-i-want-to-know">What do I want to know?</h5>
<p>You should have an understanding of what you want to do. What questions are you trying to answer with tracing? </p>
<p>Certainly, there may be other realizations and questions that come from tracing – for example, with Dapper, Google is able to audit systems for security, asserting only authorized components are talking to sensitive services<sup><a href="#footnotes" style="border-bottom: none;">11</a></sup> – but without understanding what you’re trying to figure out, you may end up approaching your instrumenting incorrectly.</p>
<p>The answer to this question will help identify how to approach the causality – whether from the Trigger point of view, or submitter. </p>
<h5 id="how-much-can-i-instrument">How much can I instrument?</h5>
<p>Another important question: how much time can you put into instrumenting your system? Or can you even instrument all parts? This will inform the approach you can use to tracing, be it blackbox or not. If you can instrument all the components, it then becomes a question of what data should you propagate through an entire flow.</p>
<h5 id="how-much-do-i-want-to-know">How much do I want to know?</h5>
<p>And finally, how much of the flows do you want to understand? Do you want to understand <em>all</em> requests? Then be prepared to take a performance penalty on the service itself. And have fun storing all that data. </p>
<p>Is a percentage of flows okay? If so, then how to approach sampling is in your answer to the “what do I want to know” question. So for understanding performance, head based sampling is just fine.</p>
<p>You’ll also need to think about whether or not you want to capture the full flow of the request, of if you want to focus on a subset of the system. This will also affect your sampling approach.</p>
<h3 id="approach-for-performance-analysis">Approach for performance analysis</h3>
<p>With performance or steady-state problems, you’ll want to try and preserve the trigger causality rather than submitter as it shows the critical path to that bottleneck. Head-based sampling is fine as we don’t need intelligent sampling, and even with low sampling rates, we’re able to get a good idea of where our problem lies. And finally, a request flow graph here is ideal since we don’t care about anomalistic behavior. We want information of the big picture rather than looking into particular, individual workflows.</p>
<h2 id="improving-performance">Improving performance:</h2><h3 id="questions-to-ask-yourself">Questions to ask yourself</h3>
<p>Most often, once you’re tracing a system, the problem will reveal itself, as will the solution. But not always, so I have a few questions to ask yourself with figuring out how to improve a service’s performance. Of course, this isn’t an exhaustive list; it’s just to get you thinking.</p>
<h6 id="batching-requests">Batching Requests</h6>
<p>Are you making multiple requests to the same service? Round trip network calls are expensive; perhaps there’s a way to batch requests. Some helpful libraries: <a href="https://github.com/tanwanirahul/django-batch-requests">django-batch-requests</a> (looks like limited to explicit v2.7 support, may work with 3.x), <a href="http://flask.pocoo.org/snippets/131/">flask snippet</a>, and a <a href="https://stackoverflow.com/questions/23687653/pyramid-invoking-a-sub-request">few pyramid approaches</a>.</p>
<h6 id="server-choice">Server Choice</h6>
<p>Can you make the switch from an Apache HTTP server to nginx? A <a href="https://www.nginx.com/blog/maximizing-python-performance-with-nginx-parti-web-serving-and-caching/">simple switch</a> may provide a boost, especially under heavy load.</p>
<h6 id="parallelization">Parallelization</h6>
<p>Are there any <a href="http://aiohttp.readthedocs.io/en/stable/">parallelization</a> opportunities? Perhaps your service doesn’t need to be synchronous or it unnecessarily blocks. For example, if you’re some big social network site, can you grab a user’s profile photo at the same time as you pull up their timeline, while getting their messages?</p>
<h6 id="caching">Caching</h6>
<p>Is it useful to add (or fix) caching? Is the same data being repeatedly requested but not cached? Or are you caching too much? or not the right data? Is the expiration too high or low?</p>
<h6 id="asset-handling">Asset Handling</h6>
<p>What about your site’s frontend assets: could they be ordered better to improve loading time? Can you minimize the amount of inline scripts? Maybe make your scripts async? Are there a lot of distinct domain lookups that adds time from DNS responses? How about decreasing the number of actual files referenced? or minify and compress them? </p>
<p>Take a look at <a href="https://github.com/miracle2k/webassets">webassets</a>, or a particular package for your framework: <a href="https://flask-assets.readthedocs.io/en/latest/">Flask-Assets</a>, <a href="https://github.com/django-compressor/django-compressor">django-compressor</a>, <a href="https://github.com/cobrateam/django-htmlmin">django-htmlmin</a>, and <a href="https://github.com/Gandi/pyramid_htmlmin">pyramid-htmlmin</a>. Mozilla also has <a href="https://developer.mozilla.org/en-US/docs/Learn/HTML/Howto/Author_fast-loading_HTML_pages">more tips</a> for fast-loading HTML pages.</p>
<h6 id="chunked-responses">Chunked Responses</h6>
<p>Can you use chunked encoding when returning large amounts of data? Or can you otherwise have your services produce elements of the response as they are needed, rather than trying to produce all elements as fast as possible? Have a look at Flask’s docs on <a href="http://flask.pocoo.org/docs/0.12/patterns/streaming/">streaming responses</a>, peek at how Pyramid supports <a href="https://docs.pylonsproject.org/projects/pyramid/en/latest/_modules/pyramid/response.html">streaming responses with <code>app_iter</code></a> or Django’s <a href="https://docs.djangoproject.com/en/dev/ref/request-response/#streaminghttpresponse-objects"><code>StreamingHttpResponse</code></a>, or get inspiration from <a href="http://aiohttp.readthedocs.io/en/stable/web_reference.html#aiohttp.web.StreamResponse">aiohttp’s implementation</a>.</p>
<h2 id="tracing-systems-amp-services">Tracing Systems & Services</h2><h3 id="opentracing">OpenTracing</h3>
<p>Impressively, there is an open standard for distributed tracing – <a href="http://opentracing.io/">OpenTracing</a> – allowing developers of applications, or open source packages, or services, from nginx to ORMs, to instrument their code without vendor lock-in. And they do this by standardizing trace span APIs.</p>
<p>One criticism of OpenTracing is there is no prescribed way to implement more intelligent sampling other than a simple percentage and setting priority. There’s also a lack of standardization for how to track relationships, whether submitter or trigger based. It’s mainly just a standardization for managing a span of a trace itself. But mind you, it’s still a very young specification that’s evolving and developing.</p>
<p>Some OpenTracing Python libraries to instrument your applications and are tracing-implementation agnostic: <a href="https://github.com/opentracing/opentracing-python">opentracing-python</a>, <a href="http://pythonhosted.org/Flask-OpenTracing/">flask-opentracing</a>, and <a href="https://github.com/opentracing-contrib/python-django">django-opentracing</a>.</p>
<h3 id="self-hosted-systems">Self-Hosted Systems</h3>
<p>And there are a few self hosted, popular solutions out there that support OpenTracing’s specification:</p>
<h4 id="zipkin-twitter">Zipkin (Twitter)</h4>
<p>Probably the most widely used is <a href="http://zipkin.io/">Zipkin</a>, from Twitter, which has implementations in Java, Go, JavaScript, Ruby, and Scala. The architecture setup is basically the instrumented app sends data out of band to a remote collector that accepts a few different transport mechanisms, including HTTP, Kafka, and Scribe. With propagating data around, all of the current Python client libraries (<a href="https://github.com/Yelp/py_zipkin">py_zipkin</a>, <a href="https://github.com/Yelp/pyramid_zipkin">pyramid_zipkin</a>, <a href="https://github.com/Yelp/swagger_zipkin">swagger_zipkin</a>, and <a href="https://github.com/qiajigou/flask-zipkin">flask-zipkin</a>) only support HTTP – no RPC support.</p>
<p><a href="http://zipkin.io/public/img/json_zipkin_screenshot.png"><img class="displayed" src="http://www.roguelynn.com/assets/images/tracing/zipkin_gantt.png" title="Zipkin screenshot" alt="Zipkin screenshot of Gantt chart"/></a>
<figcaption>Gantt chart from Zipkin’s <a href="http://zipkin.io/pages/data_model.html">documentation</a></figcaption></p>
<p>And finally, Zipkin does provide a nice Gantt chart of individual traces, and you can view a tree of dependencies, but it’s essentially only a context calling tree with no information as to latencies, status codes, or anything else. </p>
<p>An example to show how much instrumentation is needed when using <a href="https://github.com/Yelp/py_zipkin"><code>py_zipkin</code></a>:</p>
<pre><code data-lang="python"># app.py
import requests
from flask import Flask
from py_zipkin.zipkin import zipkin_span
ZIPKIN_HOST = "10.0.0.1"
ZIPKIN_PORT = "9411"
APP_PORT = 5000
app = Flask(__name__)
app.config.update({
ZIPKIN_HOST="10.0.0.1",
ZIPKIN_PORT="9411",
APP_PORT=5000,
# any other app config-y things
})
def http_transport(encoded_span):
# encoding prefix explained in https://github.com/Yelp/py_zipkin#transport
body = b"\x0c\x00\x00\x00\x01" + encoded_span
zipkin_url = "http://{host}:{port}/api/v1/spans".format(
host=app.config["ZIPKIN_HOST"], port=app.config["ZIPKIN_PORT"])
headers = {"Content-Type": "application/x-thrift"}
# You'd probably want to wrap this in a try/except in case POSTing fails
requests.post(zipkin_url, data=body, headers=headers)
@app.route("/")
def index():
kwargs = {
# name of the service, app, or otherwise overall component
service_name="myawesomeapp",
# name of the individual trace point, e.g. function name itself
span_name="index",
# must define a transport handler like above (or one for Kafka or Scribe)
transport_handler=http_transport,
# the port (int) on which your service/app/component runs
port=app.config["APP_PORT"],
# Sample rate (int) from 0 to 100; use 100 to always trace
sample_rate=100
}
with zipkin_span(**kwargs):
some_other_func()
# app logic
# add a span to the trace that was started above
@zipkin_span(service_name="myawesomeapp", span_name="some_other_func")
def some_other_func():
# other app logic
</code></pre>
<p>Using <a href="https://github.com/Yelp/py_zipkin"><code>py_zipkin</code></a>, on which other libraries are based, you need to define a transport mechanism, which can be just a simple post request with the content. You can otherwise define a Kafka or Scribe transport. But then it’s just a simple context manager placed wherever you want to start a trace, and a simple decorator to add spans to a trace. There’s also <a href="https://github.com/Yelp/py_zipkin#usage-2-trace-a-service-call">sample code</a> to add a tween to Pyramid.</p>
<h4 id="jaeger-uber">Jaeger (Uber)</h4>
<p><a href="https://github.com/uber/jaeger">Jaeger</a> is another self-hosted tracing system that supports the OpenTracing specification that comes from Uber. Rather than the application/client library reporting to a remote collector, it reports to a local agent via UDP, who then sends traces to a collector. Also unlike Zipkin, which supports Cassandra, ElasticSearch, and MySQL, Jaeger only supports Cassandra for its trace storage. </p>
<p><a href="http://jaeger.readthedocs.io/en/latest/images/traces-ss.png"><img class="displayed" src="http://www.roguelynn.com/assets/images/tracing/jaeger-trace-list.png" title="Jaeger: Traces list view example" alt="Jaeger: Traces list view example"/></a>
<figcaption>Trace list view from Jaeger’s <a href="http://jaeger.readthedocs.io/en/latest/">documentation</a></figcaption></p>
<p><a href="http://jaeger.readthedocs.io/en/latest/images/trace-detail-ss.png"><img class="displayed" src="http://www.roguelynn.com/assets/images/tracing/jaeger-trace-detail.png" title="Jaeger: Trace detail view example" alt="Jaeger: Trace detail view example"/></a>
<figcaption>Trace detail view from Jaeger’s <a href="http://jaeger.readthedocs.io/en/latest/">documentation</a></figcaption></p>
<p>However, the UI is very similar to Zipkin with pretty waterfall graphs and a dependency tree, but again, nothing that can easily help aggregate performance information. Their <a href="http://jaeger.readthedocs.io/en/latest/">documentation</a> is also lacking; but they do have a pretty decent <a href="https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941">tutorial</a> to walk through.</p>
<p>Their <a href="https://github.com/uber/jaeger-client-python">Python client</a> library is a bit cringe-worthy, taken from their <a href="https://github.com/uber/jaeger-client-python#getting-started">README</a> example:</p>
<pre><code data-lang="python"># app.py
import logging
import time
import opentracing as ot
from flask import Flask
from jaeger_client import Config
# Adapted from jaeger-client-python README at
# https://github.com/uber/jaeger-client-python#getting-started
app = Flask(__name__)
app.config.update({
"JAEGER_SAMPLE_TYPE": "const",
"JAEGER_SAMPLE_PARAM": 1,
"JAEGER_LOGGING": True,
"LOG_LEVEL": logging.DEBUG,
# any other app config-y things
})
logging.getLogger("").handlers = []
logging.basicConfig(format="%(asctime)s %(message)s", level=app.config["LOG_LEVEL"])
config = Config(
config={
"sampler": {
"type": app.config["JAEGER_SAMPLE_TYPE"],
"param": app.config["JAEGER_SAMPLE_PARAM"],
},
"logging": True,
},
service_name="myawesomeapp"
)
tracer = config.initialize_tracer()
@app.route("/")
def index():
with ot.tracer.start_span("Index") as span:
span.log_event("test message", payload={"life": 42})
with ot.tracer.start_span("IndexChild", child_of=span) as child_span:
# NB: the README linked above says `span.log_event`, but
# they might have meant `child_span.log_event`
span.log_event("another test message")
# wat
time.sleep(2) # yield to IOLoop to flush the spans
tracer.close() # flush any buffered spans
</code></pre>
<p>This is an adapted example from their <a href="https://github.com/uber/jaeger-client-python#getting-started">docs</a> that’s made to use with a Flask app. Basically you initialize a tracer that the OpenTracing Python library will use, and create spans and child spans with context managers.</p>
<p>But their usage of <code>time.sleep</code> for <code>yielding to IOLoop</code> is a bit of a head scratcher. It’s docs also make mention of support for monkeypatching libraries like requests, and redis, and urllib2. So, all I can say is, use at your own risk.</p>
<p><strong>Update</strong>: Days after this talk was given at PyCon 2017, they have updated their README.md that <a href="https://github.com/uber/jaeger-client-python/issues/50">documents</a> the reason for <code>time.sleep</code>.</p>
<h4 id="honorable-mentions">Honorable Mentions</h4>
<p>There are a couple of others that support the OpenTracing spec, including <a href="https://text.sourcegraph.com/appdash-an-open-source-perf-tracing-suite-4e1fc41c2031">AppDash</a> and <a href="http://lightstep.com/">LightStep</a> (private beta). And <a href="http://opentracing.io/documentation/pages/supported-tracers.html">a few more</a> with no python client libraries (yet).</p>
<h3 id="tracing-services">Tracing Services</h3>
<p>In case you don’t want to host your own system, there are a few services out there to help.</p>
<h4 id="stackdriver-trace-google">Stackdriver Trace (Google)</h4>
<p>There is Stackdriver Trace from Google (not to be confused with Stackdriver Logging) that’s pretty promising. Although unfortunately, Google has no Python or gRPC client libraries to instrument your app with. They do have a REST and RPC interface, if you feel so inclined.</p>
<p>But they also <a href="https://cloud.google.com/trace/docs/zipkin">support Zipkin traces</a>, where you can setup a Google-flavored Zipkin server, either on their infrastructure or on yours, and have it forward traces to Stackdriver. They actually make it pretty easy: I was able to spin up <a href="https://cloud.google.com/trace/docs/zipkin#option_1_using_a_container_image_to_set_up_your_server">their Docker image</a> on Compute Engine and start viewing traces of my sample app within a couple of minutes.</p>
<p><a href="https://cloud.google.com/trace/images/trace-overview.png"><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/gcp-trace-overview.png" title="Google's Stackdriver Trace Demo Overview" alt="Google's Stackdriver Trace Demo Overview"/></a>
<figcaption>Trace overview page, from Google’s <a href="https://cloud.google.com/trace/docs/trace-overview">Viewing Traces</a> Documentation</figcaption></p>
<p>A couple of annoyances: simple plots of response time over the past few hours and a list of all traces are automatically provided in Stackdriver’s UI. However, you have to manually make “analysis reports” for each time period you’re interested in to get fancy distribution graphs; they’re not automatically generated. It also may be annoying that trace storage is limited to 30 days – same with their Stackdriver logging. </p>
<h4 id="x-ray-aws">X-Ray (AWS)</h4>
<p>Amazon also has a tracing service available called X-Ray. I only setup their demo node app, but it looks like they don’t have any explicit Python support either – only supporting node, Java, and .NET apps. But the Python SDK – <a href="https://aws.amazon.com/sdk-for-python/">Boto</a> – has support for <a href="http://boto3.readthedocs.io/en/latest/reference/services/xray.html">sending traces to a local daemon</a>, which then forwards to the X-Ray service.</p>
<p>What’s nice about X-Ray – despite it being proprietary and not OpenTracing compliant – is you’re able to configure sampling rates for different URL routes of your application based on a fixed requests per second as well as a percentage of requests. However, it isn’t possible to configure these rules with Boto.</p>
<p>Almost redeemable are their visualizations. While there is the typical waterfall chart, they also have a request flow graph where you can see average latencies, captured traces per minute, and requests broken down by response status.</p>
<p><a href="http://docs.aws.amazon.com/xray/latest/devguide/images/scorekeep-gettingstarted-servicemap-after-github.png"><img class="img-displayed" src="http://www.roguelynn.com/assets/images/tracing/aws-xray-service-map.png" title="AWS X-Ray: Service Map" alt="AWS X-Ray: Service Map"/></a>
<figcaption>Request Flow Chart (“Service Map”) from AWS’s <a href="http://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html">What is AWS X-Ray</a> documentation</figcaption></p>
<p>So, basically AWS seems pretty cool, probably the most useful, but will take some work in instrumenting a python app, and induces vendor lock-in.</p>
<h4 id="honorable-mentions-2">Honorable Mentions</h4>
<p>A couple of honorable mentions that do app performance measurement: <a href="https://www.datadoghq.com/apm/">Datadog</a> and <a href="https://newrelic.com/application-monitoring">New Relic</a>. I don’t have experience with these services for this problem space, but they certainly provide helpful tools for tracing across applications.</p>
<h2 id="tldr">TL;DR</h2>
<p>You need this. If you run microservices, you should be tracing them. It’s otherwise very difficult to understand an entire system’s performance, anomalistic behavior, resource usage, among many other aspects. </p>
<p>However, good luck with that. Whether you choose a self- hosted solution or a provided service, documentation is all-around lacking. Granted, it’s still a very young space, very much growing as the OpenTracing standard is developing.</p>
<p>As I mentioned, Python support isn’t 100%; and even if there is, there’s a lack of configuration for relationship tracking, intelligent sampling, and available visualizations. But, there is an open spec that can be influenced, or use to implement your own – if you’re so inclined.</p>
<h2 id="further-reading">Further Reading</h2>
<p>With respect to global synchronization, I think it’d be pretty interesting to use Merkle trees in place of logical clocks. A relevant white paper I found: “<a href="http://pauillac.inria.fr/%7Efpottier/X/INF441/projets/merkle/merkle.pdf">Merkle Hash Trees for Distributed Audit Logs</a>” by Karthikeyan Bhargavan.</p>
<p>Posts from various companies:</p>
<ul>
<li> “<a href="https://medium.com/@Pinterest_Engineering/distributed-tracing-at-pinterest-with-new-open-source-tools-a4f8a5562f6b">Distributed tracing at Pinterest with new open source tools</a>” by Suman Karumuri</li>
<li> “<a href="https://engineeringblog.yelp.com/2016/04/distributed-tracing-at-yelp.html">Distributed tracing at Yelp</a>” by Prateek A.</li>
<li> “<a href="https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency">Real-time distributed tracing for website performance and efficiency optimizations</a>” by Chris Coleman & Toon Sripatanaskul (LinkedIn)</li>
<li> “<a href="https://blog.twitter.com/engineering/en_us/a/2012/distributed-systems-tracing-with-zipkin.html">Distributed Systems Tracing with Zipkin</a>” by @cra (Twitter)</li>
</ul>
<h2 id="footnotes">Footnotes</h2>
<p><strong>nb:</strong> I had personal difficulty properly citing everything within this blog post as it’s a digestion and an amalgamation of a few different papers, all smushed together. This is not <em>exactly</em> following the Chicago Manual of Style for citations (with the linking and absence of a proper bibliography) as I’m prefer ease for the reader for this non-scientific blog post.</p>
<p><sup>1</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p1 <br/>
<sup>2</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p3-4 <br/>
<sup>3</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p11 <br/>
<sup>4</sup> <a href="http://web.eecs.umich.edu/%7Etwenisch/papers/osdi14.pdf">Cho et al.</a> (pdf), p3 <br/>
<sup>5</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p8 <br/>
<sup>6</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p10 <br/>
<sup>7</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p11 <br/>
<sup>8</sup> <a href="https://research.google.com/pubs/pub36356.html">Sigelman et al.</a>, 2010, p7 <br/>
<sup>9</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p11-13 <br/>
<sup>10</sup> <a href="http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf">Sambasivan et al.</a> (pdf), 2014, p13 <br/>
<sup>11</sup> <a href="https://research.google.com/pubs/pub36356.html">Sigelman et al.</a>, 2010, p5 <br/></p>
Spotify’s Love/Hate Relationship with DNShttp://www.roguelynn.com/words/spotifys-love-hate-relationship-with-dns/2017-03-31T09:45:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><em><strong>Nota Bene:</strong> This post was written and first published on Spotify’s <a href="https://labs.spotify.com/2017/03/31/spotifys-lovehate-relationship-with-dns/">developer blog</a>.</em></p>
<hr>
<p><em><strong>Forward:</strong> This blog post accompanies my presentation given at <a href="https://www.usenix.org/conference/srecon17americas">SRECon 2017</a> in San Francisco. The recording of the talk can be viewed <a href="https://www.usenix.org/conference/srecon17americas/program/presentation/root">here</a>, with the accompanying slides <a href="https://speakerdeck.com/roguelynn/hate-relationship-with-dns">here</a>. Cross-posted from Spotify’s engineering blog, <a href="https://labs.spotify.com">labs.spotify.com</a>.</em></p>
<hr>
<p>Spotify has a <a href="https://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/">history</a> of loving “boring” technologies. It’s not that often people talk about DNS; when they do, it’s usually to <a href="https://www.google.com/#q=site:reddit.com+it%27s+always+dns&amp;*">complain</a>, or when a <a href="http://dyn.com/blog/dyn-statement-on-10212016-ddos-attack/">major outage</a> happens. Otherwise, DNS is initially set up – probably with a 3rd party provider – and then mostly forgotten about. But it’s because DNS is boring that we love it so much. It provides us with a stable and widely-known query interface, free caching, and service discovery.</p>
<p>This post will walk through how we have designed and currently manage our own DNS infrastructure, our curious ways in which we use DNS, and the future of boring technology at Spotify.</span></p>
<h1 id="our-infrastructure">Our infrastructure</h1>
<p>We run our own DNS infrastructure on-premise which might seem a bit unusual lately. We have a typical setup with what we call a single “stealth primary” (and a hot standby) running BIND (DNS server software), and its only job is essentially to compile zone files. We then have a bunch of authoritative nameservers (or “secondaries”), also running <a href="https://en.wikipedia.org/wiki/BIND">BIND</a>, with at least two per geographical location, and four of which are exposed to the public. When the stealth primary has finished re-compiling zones, a transfer happens to the nameservers. </span></p>
<p><img class="img-displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/dns_architecture_no_srv.png" title="Spotify DNS Architecture Overview" alt="Spotify DNS Architecture Overview"/>
<figcaption>Spotify DNS Architecture Overview</figcaption></p>
<p>We then have a bunch more resolvers running <a href="https://en.wikipedia.org/wiki/Unbound_(DNS_server)">Unbound</a> (caching and recursive DNS server software), with at least 2 resolvers per datacenter suite. Our resolvers are configured to talk to every one of our authoritative nameservers for redundancy.</p>
<p>We also have Unbound running on each deployed service host, configured to use the resolvers in the service’s site. Using Unbound everywhere gives us the caching that we’d rather not implement ourselves. We have also always relied on DNS for service discovery, so Unbound helps avoid many requests from individual services that can effectively DDoS ourselves.</p>
<p>The Unbound service <a href="http://www.unbound.net/documentation/info_timeout.html">randomly selects its DNS server</a> to send queries to based on a RTT (round-trip time) band of less than 400 milliseconds. This unfortunately does not mean that Unbound will always select the fastest-responding server. However, as you can see from the output below, from one of our resolvers located in Google Compute’s Asia East region (to which <code>gae</code> in the prefix of the hostname refers), the fastest responding authoritative nameserver is logically located in the same region. The others under the 400 millisecond threshold are located in the physically closest region, western US; then every other nameserver has a RTT higher than the 400 millisecond band.</p>
<pre><code data-lang="console">root@gae2-dnsresolver-a-0319:~# unbound-control lookup spotify.net
The following name servers are used for lookup of spotify.net.
The noprime stub servers are used:
Delegation with 0 names, of which 0 can be examined to query further addresses.
It provides 18 IP addresses.
10.173.0.4 rto 706 msec, ttl 309, ping 62 var 161 rtt 706, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.173.0.5 rto 706 msec, ttl 236, ping 62 var 161 rtt 706, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.174.0.6 rto 548 msec, ttl 251, ping 68 var 120 rtt 548, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.174.0.25 rto 500 msec, ttl 246, ping 36 var 116 rtt 500, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
23.92.97.20 rto 603 msec, ttl 249, ping 63 var 135 rtt 603, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
23.92.102.148 rto 565 msec, ttl 238, ping 45 var 130 rtt 565, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
78.31.11.120 rto 738 msec, ttl 248, ping 82 var 164 rtt 738, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
194.132.176.115 rto 738 msec, ttl 299, ping 82 var 164 rtt 738, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.254.35.16 rto 743 msec, ttl 238, ping 83 var 165 rtt 743, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.243.97.245 rto 742 msec, ttl 243, ping 82 var 165 rtt 742, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
194.71.232.219 rto 729 msec, ttl 246, ping 65 var 166 rtt 729, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
193.181.6.79 rto 734 msec, ttl 246, ping 66 var 167 rtt 734, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
193.181.182.199 rto 803 msec, ttl 247, ping 91 var 178 rtt 803, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
194.68.177.51 rto 154 msec, ttl 241, ping 134 var 5 rtt 154, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
194.68.177.243 rto 150 msec, ttl 302, ping 130 var 5 rtt 150, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.163.6.191 rto 649 msec, ttl 350, ping 41 var 152 rtt 649, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.175.0.4 rto 12 msec, ttl 306, ping 0 var 3 rtt 50, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
10.163.216.81 rto 811 msec, ttl 244, ping 75 var 184 rtt 811, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
</code></pre>
<p>In addition to running Unbound on each service host, the services of which are built using <a href="https://github.com/spotify/apollo">Apollo</a>, our open-sourced microservice framework in Java, also makes use of <a href="https://github.com/spotify/dns-java">dns-java</a>, providing us with another layer of resiliency. The dns-java library will hold onto cached records for an hour if the local Unbound service fails or our DNS resolvers aren’t responding.</p>
<h2 id="dns-record-generation-amp-deployments">DNS record generation & deployments</h2>
<p>Like many tech companies, we’ve grown into better practices. We did not always have the setup of automatic DNS record generation and deployments that did not require babysitting and forewarning.</p>
<p>Before the push to automating DNS deploys and zone data generation, we hand-edited our zone files, committed them into version control (<em>fun fact: we started with Subversion then moved to Git around 2012</em>), and then manually deployed the new zone data with a script ran on our primary.</p>
<!-- HTML generated using hilite.me -->
<div style="background: #ffffff; width: auto; padding: .2em .6em;">
<pre style="margin: 0; line-height: 125%; overflow: auto;"><span style="color: #557799;">08:35 </span><span style="color: #007700;">< j***k> </span>and also VERY FUNNY PEOPLE
<span style="color: #557799;">08:35 </span><span style="color: #007700;">< m***k> </span>j***k likes us \o/
<span style="color: #557799;">08:35 </span><span style="color: #007700;">< s***k> </span>we like j***k
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< d***n> </span>DNS DEPLOY
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< j***v> </span>What is this "DNS DEPLOY" thingy you guys keep screaming?
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< d***n> </span>j***v, when we deploy new dns content
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< j***k> </span>http://i.qkme.me/364h55.jpg
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< j***v> </span>Alearting eachother?
<span style="color: #557799;">08:36 </span><span style="color: #007700;">< d***n> </span>yup
<span style="color: #557799;">08:37 </span><span style="color: #007700;">< j***v> </span><span style="color: #000077;">d***n:</span> Why?
<span style="color: #557799;">08:37 </span><span style="color: #007700;">< d***n> </span>in case there's problems and I guess also as a locking mechanism :)
</pre>
</div>
<p>After feeling too much pain with manual edits and deployments, in 2013 we made the push for automation. We started first with making incremental pushes to script our record generation. Then we added required peer reviews and integration testing for records that still needed manual edits (e.g. marketing-friendly CNAMEs). Finally in 2014, we were comfortable enough to “forget” about DNS after setting up cron jobs for regular record generation and DNS deploys.</p>
<h4 id="our-automation-in-detail">Our automation in detail</h4>
<p>We have a pair of cron’ed scripts written in Python that are scheduled to run every 10 minutes (staggered from each other): one script generates records from our <a href="https://labs.spotify.com/2016/03/25/managing-machines-at-spotify/">physical fleet</a> via a service of ours called “ServerDB”; the other script talks to the <a href="https://news.spotify.com/us/2016/02/23/announcing-spotify-infrastructures-googley-future/">Google Compute</a> API for our cloud instances with deployed services.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/automated_rec_gen_deploy.gif" title="Spotify DNS Automated Record Generation & Deployment" alt="Spotify DNS Automated Record Generation & Deployment"/>
<figcaption>Spotify DNS Automated Record Generation & Deployment</figcaption></p>
<p>That script takes about 4 minutes to get lists of instances for every service, and finally commit to our DNS data repository, which we consider our source of truth.</p>
<p>We then have another cron that will run about 3 minutes after the git push. The stealth primary hosts this cron job, which runs every 5 minutes. It simply pulls from our DNS data repository, then compiles all the zone data via named. With every compile time – which takes about 4 minutes – we update a particular TXT record that reflects the git commit hash that’s currently being deployed to production.</p>
<p>Once done, the primary notifies our authoritative nameservers of potential changes to the newly compiled zone files. The authoritative nameservers then query the primary to see if there are any zone changes, and if so initiates an authoritative transfer (<a href="https://en.wikipedia.org/wiki/DNS_zone_transfer">AXFR</a>). These transfers to the nameservers also take about 4 minutes.</p>
<p>Looking at the timeline in the animated diagram above, it takes – at best – 15 minutes for the record of a new service’s host or instance to propagate.</p>
<h2 id="what-about-service-discovery">What about Service Discovery?</h2>
<p>SRV records had also been hand-crafted. This, however, became problematic. Typically, changes to DNS are relatively slow, but service discovery needs to move fast. Couple that with manual edits being prone to human error, we needed a better solution.</p>
<p>Earlier in the infrastructure overview, we have services locally running unbound, talking to our resolvers:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/service_discovery_intro.png" title="Spotify Service Discovery Intro" alt="Spotify Service Discovery Intro" width="491" height="500"/>
<figcaption>Spotify Service Discovery Intro</figcaption></p>
<p>Since the then-current open source solutions did not address all of our needs, we built our own service discovery system called Nameless, supporting both SRV and A record lookups. It allows engineers to dynamically register and discover services, which made it easier to increase and decrease the number of instances of a particular service.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/service_discovery.png" title="Nameless: Spotify Service Discovery" alt="Nameless: Spotify Service Discovery" width="509" height="500"/>
<figcaption>Spotify Service Discovery</figcaption></p>
<p>To separate service discovery from regular internal DNS requests, Nameless owns the <code>services</code> subdomain, e.g. <code>services.lon6.spotify.net</code>.</p>
<pre><code data-lang="console">$ spnameless -b lon6 query --service metadata | head -n 4
query: service:metadata, protocol:hm (75 endpoints)
lon6-metadataproxy-a133t.lon6.spotify.net.:1234 (UP since 2017-03-11 00:58:07, expires TIMED)
lon6-metadataproxy-af00l.lon6.spotify.net.:1234 (UP since 2017-03-11 00:58:18, expires TIMED)
</code></pre><h2 id="monitoring">Monitoring</h2>
<p>Historically, DNS has been quite stable, but nevertheless, sh!t happens. There are a few ways we monitor our DNS infrastructure. We collect metrics emitted by Unbound on our resolvers, including number of queries by record type, SERVFAILs, and net packets:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/monitor_servfails.png" title="Monitoring: Spotify DNS Resolver Queries" alt="Monitoring: Spotify DNS Resolver Queries"/>
<figcaption>Queries per 5 minutes per resolver</figcaption></p>
<p>We also monitor our record generation jobs for gaps or spikes:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/monitor_rec_gen.png" title="Monitoring: Spotify DNS Record Generation" alt="Monitoring: Spotify DNS Record Generation"/>
<figcaption># of A, CNAME, and PTR records generated for physical hosts and GCP instances</figcaption></p>
<p>Most recently, we built a tool that allows us to track response latency, availability and correctness for particularly important records, and deployment latency. For internal latency, availability, and correctness, we use <a href="http://www.dnspython.org/">dnspython</a> to query our resolvers and authoritative nameservers from their respective datacenter suites.</p>
<p>For external latency, through Pingdom’s API, we grab the response latency from our public nameservers. While Pingdom’s monitoring is very valuable to us, we’ve found it difficult to tease out from where they send their queries to measure latency, as you can see in this pretty volatile graph:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/monitor_response_latency.png" title="Monitoring: Spotify DNS Response Latency" alt="Monitoring: Spotify DNS Response Latency"/>
<figcaption>Response time per public nameserver, as reported by Pingdom</figcaption></p>
<h1 id="our-other-dns-curiosities">Our other DNS curiosities</h1>
<p>Beyond its typical uses, we leverage DNS in interesting ways.</p>
<h3 id="client-error-reporting">Client Error Reporting</h3>
<p>There will always be clients that cannot connect to Spotify at all. In order to track the errors and number of users affected, and to circumvent any potentially restrictive firewalls, we introduced error reporting via DNS. The client would make a single DNS query to a specific subdomain, with all the relevant information needed in the query itself, and the queried DNS server then logs it. It’s then parsed, and tracked in this lovely Kibana graph:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/dnsparser_client_errors.png" title="Spotify Client Error Reporting Graph" alt="Spotify Client Error Reporting Graph"/>
<figcaption>Client errors by code</figcaption></p>
<h3 id="dht-ring">DHT ring</h3>
<p>We also use DNS as a <a href="https://en.wikipedia.org/wiki/Distributed_hash_table">DHT ring</a> with TXT records as storage for some service configuration data. One implementation is for song lookup when it isn’t already locally cached. When the Spotify client lookups a requested song, the song ID itself is hashed, which is a key along the ring. </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/dht_ring_lookup.png" title="Spotify DHT Ring: Host Lookup" alt="Spotify DHT Ring: Host Lookup" width="572" height="450"/>
<figcaption>Spotify DHT Ring: Host Lookup</figcaption></p>
<p>The value associated with the key is essentially the host location where that song can be found. In this very simplified case, Instance E owns keys from <code>9e</code> to <code>c1</code>. Instance E actually points to a record, something like <code>tracks.1234.lon6-storage-a5678.lon6.spotify.net</code>:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/dht_ring_pointer.png" title="Spotify DHT Ring: Host Pointer" alt="Spotify DHT Ring: Host Pointer" width="565" height="450"/>
<figcaption>Spotify DHT Ring: Host Pointer</figcaption></p>
<p>This isn’t a real host, however. It just directs the client to query <code>lon6-storage-a5678.lon6.spotify.net</code> via port 1234 in order to find the song that has the ID “d34db33f”.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/spotify-dns/dht_ring_host_lookup.png" title="Spotify DHT Ring: Pointer to Host" alt="Spotify DHT Ring: Pointer to Host"/></p>
<h3 id="host-naming-conventions">Host naming conventions</h3>
<p>Not really a DNS-specific use, but relevant nonetheless: the current convention of our hostnames contains the datacenter suite location, the role of the machine, and the pool that it’s grouped with. As mentioned in <a href="https://labs.spotify.com/2016/03/25/managing-machines-at-spotify/">Managing Machines at Spotify</a>, every machine typically has a single role assigned to it, where a “role” is Spotify-parlance for a microservice.</p>
<p>For example, we might have a machine with a hostname <code>ash2-dnsresolver-a1337.ash2.spotify.net</code>. The first four characters tells us that it’s in our Ashburn, Virginia location, in our second “pod” there. The next section, <code>dnsresolver</code>, tells us that the host has the role “dnsresolver” assigned to it. This allows puppet to grab the required classes and modules for that role, installing the needed software and setting up its configuration. The first character of last segment, <code>a</code>, is a reference to what we call a pool, and is designated by the team who owns that service. It can mean what they choose; for example, it can be used for separating testing over production, canary deploys, or directing puppet with further configuration. The number – <code>1337</code> – is a randomized bit that is unique to the individual host itself.</p>
<p>You might have noticed the use of <code>.net</code> rather than <code>.com</code> in the example. Spotify makes use of both <code>spotify.com</code> and <code>spotify.net</code> TLDs. Originally, <code>.com</code> and <code>.net</code> was intended for commercial and network infrastructure use, respectively (despite <code>.net</code> not being mentioned in the related <a href="https://tools.ietf.org/html/rfc920">RFC</a>). However, that distinction has not been enforced in any meaningful way. Nevertheless, Spotify, for the most part, makes use of the original designation with <code>.net</code> as many internal and infrastructure-related domains, while <code>.com</code> for public-facing, commercial use. </p>
<h3 id="microservice-lookup">Microservice Lookup</h3>
<p>To help engineers quickly see all the machines associated for a particular role, we’ve added the automatic creation of “roles” zone files. By using the role that you’re interested in, you can dig the “roles” zone to get a list of host IPs:</p>
<pre><code data-lang="console">$ dig +short dnsresolver.roles.ash2.spotify.net
10.1.2.3
10.4.5.6
10.7.8.9
</code></pre>
<p>Or with pointer records, you can get a list of their fully qualified domain names:</p>
<pre><code data-lang="console">$ dig +short -t PTR dnsresolver.roles.ash2.spotify.net
ash2-dnsresolver-a1337.ash2.spotify.net.
ash2-dnsresolver-a0325.ash2.spotify.net.
ash2-dnsresolver-a0828.ash2.spotify.net.
</code></pre><h1 id="what-we-learned-along-the-way">What we learned along the way</h1>
<p>Only when you entangle yourself in DNS do you realize new ways to break it, some weird intricacies, and esoteric problems. And there is certainly no better way to bring down your entire service than to mess with DNS.</p>
<h3 id="differences-in-linux-distributions">Differences in Linux distributions</h3>
<p>A few years ago, we migrated our entire fleet from Debian Squeeze to Ubuntu Trusty, including a gradual migration of our authoritative nameservers. We rolled out one at a time, testing each one before moving on to the next to ensure that it could resolve records and, for the public nameservers, receive requests on eth1. </p>
<p>Upon the migration of the final nameserver – you guessed it – DNS died everywhere. The culprit turned out to be a difference in firewall configuration between the two OSes: the default generated ruleset on Trusty did not allow for port 53 on the public interface. This was missed because only <code>dnstop</code> was used to test connections being received, but the public interface wasn’t directly queried, therefore missing rejected requests.</p>
<h3 id="issues-at-scale">Issues at Scale</h3><h4 id="truncated-tcp-responses">Truncated TCP Responses</h4>
<p>In support of part of our data infrastructure, we have nearly 2500 hosts dedicated for Hadoop worker roles. Each host has an A record and a PTR record. When querying for all machines, because of the number of records being too large for UDP, DNS falls back to TCP.</p>
<pre><code data-lang="console">$ dig +short @lon4-dnsauthslave-a1.lon4.spotify.net hadoopslave.roles.lon4.spotify.net | head -n 4
;; Truncated, retrying in TCP mode.
10.254.52.15
10.254.63.7
10.254.74.8
$ dig +tcp +short @lon4-dnsauthslave-a1.lon4.spotify.net hadoopslave.roles.lon4.spotify.net | wc -l
2450
</code></pre>
<p>However, with PTR records, we do not get the expected number back:</p>
<pre><code data-lang="console">$ dig +tcp +short -t PTR @lon4-dnsauthslave-a1.lon4.spotify.net hadoopslave.roles.lon4.spotify.net | wc -l
1811
</code></pre>
<p>Looking at the message size:</p>
<pre><code data-lang="console">$ dig +tcp @lon4-dnsauthslave-a1.lon4.spotify.net -t PTR hadoopslave.roles.lon4.spotify.net | tail -n 5
;; Query time: 552 msec
;; SERVER: 10.254.35.16#53(10.254.35.16)
;; WHEN: Fri Mar 10 18:52:04 2017
;; MSG SIZE rcvd: 65512
$ dig +tcp @lon4-dnsauthslave-a1.lon4.spotify.net -t PTR hadoopslave.roles.lon4.spotify.net | tail -n 5
;; Query time: 481 msec
;; SERVER: 10.254.35.16#53(10.254.35.16)
;; WHEN: Fri Mar 10 18:52:19 2017
;; MSG SIZE rcvd: 65507
$ dig +tcp @lon4-dnsauthslave-a1.lon4.spotify.net -t PTR hadoopslave.roles.lon4.spotify.net | tail -n 5
;; Query time: 496 msec
;; SERVER: 10.254.35.16#53(10.254.35.16)
;; WHEN: Fri Mar 10 18:53:13 2017
;; MSG SIZE rcvd: 65529
</code></pre>
<p>Response sizes hover right below 65,535 bytes, the max size of a TCP packet. This certainly makes sense, but what is a bit of a head scratcher is if sending a query through one of our DNS resolvers, we get nothing back, with no sign of any errors:</p>
<pre><code data-lang="console">$ dig +tcp @lon4-dnsresolver-a1.lon4.spotify.net -t PTR hadoopslave.roles.lon4.spotify.net
; <<>> DiG 9.8.3-P1 <<>> +tcp @lon4-dnsresolver-a1.lon4.spotify.net -t PTR hadoopslave.roles.lon4.spotify.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34148
;; flags: qr tc rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;hadoopslave.roles.lon4.spotify.net. IN PTR
;; Query time: 191 msec
;; SERVER: 10.254.35.17#53(10.254.35.17)
;; WHEN: Sat Mar 11 13:38:00 2017
;; MSG SIZE rcvd: 52
</code></pre>
<p>We partially put ourselves in this pickle with our long host naming conventions discussed earlier. But it seems as if unbound entirely blocks responses that are larger than the maximum size of a packet, regardless of TCP or UDP (see “msg-buffer-size” in <a href="https://www.unbound.net/documentation/unbound.conf.html">docs</a>).</p>
<h4 id="docker-upgrades">Docker Upgrades</h4>
<p>At Spotify, we’ve been using Docker with our own orchestration tool, <a href="https://github.com/spotify/helios">Helios</a>, for a fews year now. Docker gives us the ability to ship code faster and with greater confidence, but it hasn’t always been smooth sailing.</p>
<p>Some of our services use host-mode networking, wherein the container has the exact same network stack as the host. When upgrading Docker from 1.6 to 1.12, we notice that requests to the host’s unbound service were rejected. This was because when using 1.6, the source IP appears to be localhost. But with 1.12, the source IP appears as the eth0 IP of the host, and therefore is rejected by our unbound setup. Despite this upgrade being gradually rolled out to the services using containers, it affected over half of the current active users for over an hour. </p>
<h1 id="future-of-dns-at-spotify">Future of DNS at Spotify</h1>
<p>For the most part, our DNS infrastructure has been very stable and doesn’t require babysitting. But it’s certainly a lot of pressure: if we mess something up, it has the potential to impact a lot of users. </p>
<p>At Spotify, we’ve been focusing a lot of effort recently in the concept of ephemerality. We want our engineers to <a href="https://web.archive.org/web/20140122195940/http://www.pistoncloud.com/2013/04/announcing-enterprise-openstack-version-2/">not treat hosts as pets, but as cattle</a>. This means we want to provide our service owners the ease and ability to scale up and down as needed. But our current infrastructure prevents this.</p>
<h3 id="moving-to-google-dns">Moving to Google DNS</h3>
<p>You may have calculated that our propagation time for new DNS records is agonizingly slow. When perfectly timed, a newly spun-up host will have its relevant A and PTR records propagated to our authoritative nameservers in 15 minutes. If the staggered timing of our cron jobs were not an issue, it would be cut down to 12 minutes. But on average, new records are propagated and resolvable – and therefore able to take traffic – in 20-30 minutes. This is far from ideal when wanting to quickly scale quickly, much less so when responding to incidents requiring changes to a service’s capacity.</p>
<p>Before our move to Google Cloud, this issue had gone unnoticed. It takes nearly as long for our physical machines to initially boot up and install required services and packages. Yet with Google Compute, instances are up and running within minutes, exposing the latency in DNS record propagation.</p>
<p>As a recent hack project, we played around with Google’s <a href="https://cloud.google.com/dns/docs/">Cloud DNS</a> offering. It has not publically released its feature for <a href="https://partnerissuetracker.corp.google.com/issues/35904549#">automatic DNS registration</a> upon instance launch, so we plugged the Cloud DNS’s API into our <a href="https://labs.spotify.com/2016/03/25/managing-machines-at-spotify/">capacity management workflow</a>. The results were pretty remarkable: new records were propagated and resolvable in <em>less than a minute</em> after the initial instance request, essentially improving propagation time by an order of magnitude.</p>
<p>So eventually, after we platformize our hack and are comfortable from having run parallel infrastructures for some time, we’ll be handing off our DNS infra to the folks that probably know how to do it better than us.</p>
<h1 id="summary">Summary</h1>
<p>Similar with our <a href="https://labs.spotify.com/2015/10/09/designing-the-spotify-perimeter/">approaches</a> <a href="https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/">with our</a> <a href="https://labs.spotify.com/2016/03/25/managing-machines-at-spotify/">infrastructure</a>, we try to tackle our problems in an iterative and engineering-based way. We first approached DNS in more of an immediate, one-off fashion: manual record management and deploys were adequate when there wasn’t much movement for our backend services or pressure to scale <em>quickly</em>. Then as Spotify’s infrastructure grew up, new services being developed, and more users to support, we needed an automated and hands-off strategy, which has served us well for 3 years. Now, with the focus on ephemerality, we have the opportunity to challenge and re-think our approach to Spotify’s DNS infrastructure. </p>
Diversity: We're Not Done Yethttp://www.roguelynn.com/words/were-not-done-yet/2015-07-23T09:46:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p><i>Update on September 6th: Finally included the IPython notebooks!</i></p>
<hr>
<p>This post is an accompaniment to my <a href="https://ep2015.europython.eu/conference/talks/diversity-we-are-not-done-yet">Diversity: We’re not done yet</a> first at <a href="https://ep2015.europython.eu">EuroPython</a> in Bilbao, Spain in July 2015, then as a <a href="https://2015.djangocon.us/blog/2015/08/24/announcing-our-keynote-speakers/">keynote for DjangoCon US</a> in Austin, TX in September 2015. Slides <a href="https://speakerdeck.com/roguelynn/diversity-were-not-done-yet">here</a>, video from EuroPython <a href="https://www.youtube.com/watch?v=xWLFiKfoOTA">here</a>, and the IPython notebooks for the data visualization <a href="https://github.com/econchick/pyladies-data">here</a>.</p>
<hr>
<p>I just recently <strike>nerded out</strike> saw the San Francisco Symphony <a href="http://www.sfsymphony.org/Buy-Tickets/2014-2015/J-J-Abrams%E2%80%99s-Star-Trek-Feature-film-with-live-orch.aspx">play the whole score</a> to the full screening of the 2009 Star Trek movie. It was ridiculously awesome. <em>Side note: I love <a href="https://open.spotify.com/track/0aCxxrboiRsVhv0jSiWD6l">this</a> rendition of The Next Generation theme by <a href="http://www.vitaminstringquartet.com/">Vitamin String Quartet</a>.</em></p>
<p>It reminded me that <a href="https://en.wikipedia.org/wiki/Gene_Roddenberry">Gene Roddenberry</a> had some great comments on the TV series that are very on-point with regards to diversity:</p>
<blockquote>
<p>One obstacle to adulthood needs to be solved immediately:</p>
<p>We must learn <strong>not just to accept differences</strong> between ourselves and our ideas, but to <strong>enthusiastically</strong> welcome and enjoy them. Diversity contains as many treasures as those waiting for us in other worlds.</p>
<p>We will find it <strong>impossible to fear diversity</strong> and to enter into the future at the same time.
<small>Gene Roddenberry</small></p>
</blockquote>
<p>It was so very comforting to hear this the first time from someone so prominent in “geek” culture. I grew up with my father watching The Next Generation, Deep Space 9, and The Voyager (my favorite is The Next Generation and Captain Luc Picard). My <a href="https://github.com/albadraco">father</a> is the the doppelgänger of <a href="http://www.roguelynn.com/assets/images/diversity-not-done-yet/data.gif">Data</a> - whom he dressed up as for one halloween.</p>
<p>But I have a suspicion that having Star Trek in the background of my adolescence has subconsciously influenced me. Looking at the TV series as an adult, its discourse is very diplomatic – something still very novel today. The characters had very altruistic personal views, and applied them in difficult situations. The whole premise of the Starfleet was purely humanitarian and peacekeeping. The series itself was used to reflect current cultural issues including racism, sexism, class warfare, among many others. <a href="http://www.ibiblio.org/jwsnyder/wisdom/trek.html">Roddenberry himself even said</a> “[By creating] a new world with new rules, I could make statements about sex, religion, Vietnam, politics, and intercontinental missiles.”</p>
<h4 id="some-background">Some Background</h4>
<p>Perhaps you’re new to Python or not involved with the community very much - but to get everyone up to speed, the past few years there’s been a huge movement to improve this community’s diversity. Four EuroPython’s ago, 2012 in Florence, I gave a <a href="http://www.roguelynn.com/words/2012-07-12-a-memorable-europython-for-the-better/">keynote</a> as a complete Python newbie about how I was working to help increase the amount of women within the Python community. I found it only appropriate to give an “update” for this year’s EuroPython.</p>
<p>But this turns into a bit of an ass-kicking if anything.</p>
<h1 id="why-it39s-a-problem">Why it’s a problem</h1>
<p>To give some context, I could try my best to explain why having a lack of diversity is a problem that we should care about. But in reading some research and a few scientific papers, I found a few highlights that did a better job than I could.</p>
<p>The first one - from <a href="https://hbr.org/2011/06/defend-your-research-what-makes-a-team-smarter-more-women/ar/1">Harvard Business Review</a>:</p>
<blockquote>
<p>There’s little correlation between a group’s collective intelligence and the IQs of its individual members.
But if a group includes <em>more women</em>, its collective intelligence rises.</p>
</blockquote>
<p>The next one from the <a href="https://www.nae.edu/File.aspx?id=10231https://www.nae.edu/File.aspx?id=10231">National Academy of Engineering</a>:</p>
<blockquote>
<p>…creativity depends on our life experiences. Without diversity, the life experiences we bring to an engineering problem are limited.</p>
<p>As a consequence, we may not find the best engineering solution.</p>
</blockquote>
<p>And from <a href="http://blogs.scientificamerican.com/voices/2014/09/10/diversity-in-stem-what-it-is-and-why-it-matters/">Scientific American</a>:</p>
<blockquote>
<p>…when groups of intelligent individuals are working to solve hard problems, the diversity of the problem solvers matters more than their individual ability.</p>
<p>Thus, diversity is not distinct from enhancing overall quality—it is integral to achieving it.</p>
</blockquote>
<p>Another one from the same article in <a href="http://blogs.scientificamerican.com/voices/2014/09/10/diversity-in-stem-what-it-is-and-why-it-matters/">Scientific American</a>:</p>
<blockquote>
<p>…chronic and woeful underrepresentation in the workforce leads to “the inescapable conclusion that we are missing critical contributors to our talent pool.”</p>
<p>It is hard to grow a workforce —let alone get the “best” workforce— when there’s broad underrepresentation of up to 75 percent of the potential talent pool.</p>
</blockquote>
<p>One last one from a different <a href="https://hbr.org/product/the-athena-factor-reversing-the-brain-drain-in-sci/an/10094-PDF-ENG">Harvard Business Review</a> case:</p>
<blockquote>
<p>After 10 years of work experience, 41% of women in tech leave the industry, compared with 17% of men.</p>
<p>But they are not more likely to leave that the women in other industries because of having families.</p>
</blockquote>
<p>When you look at <a href="https://gigaom.com/2014/08/21/eight-charts-that-put-tech-companies-diversity-stats-into-perspective/">actual data</a> from current, popular technology companies, you can see a lack of gender diversity across the board within technical positions:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/gender-breakdown.jpg" title="Gender breakdown within tech fields among major tech companies" alt="Gender breakdown within tech fields among major tech companies"/></p>
<p>No surprise - the leadership for tech companies are lacking diversity. Taking the current <a href="http://fortune.com/2015/06/13/fortune-500-tech/">top 20 highest-earning technology companies</a> on the Fortune 500 list, of 66 CEOs in the history of these companies (I counted - but it was difficult to find history for some companies), only 7 are women.</p>
<h1 id="what-we39re-doing-right">What we’re doing right</h1>
<p>I mentioned earlier that there has been a large initiative within the Python community to increase diversity. So what are we doing?</p>
<h2 id="python-software-foundation">Python Software Foundation</h2>
<p>I’ve been on the board of the PSF for 2 terms, and just started the 3rd term. In those years, we’ve seen and approved a large influx of <a href="https://www.python.org/psf/records/board/resolutions/">grant requests</a> that specifically target diversity initiatives.</p>
<p>Just two board meetings ago - June 23rd - we <a href="https://www.python.org/psf/records/board/resolutions/">approved</a> 3 Django Girls funding requests, 2 grants to PyCon UK specifically targeted to getting kids and teachers to their conference, and 2 grants for workshops in low-economic areas, or areas that would have difficulty accessing a computer. From some napkin calculations, we’ve given 19 grants to Django Girls, totaling over $22,000.</p>
<p>We also are trying to be introspective too. This past elections, we made a <a href="https://mail.python.org/mailman/private/psf-members/2015-April/013405.html">call to the members list</a> pleading to take diversity into consideration when nominating. Because of those efforts, out of 11 directors on the <a href="https://www.python.org/psf/records/board/history/#id2">current board</a>, we have 7 women. That is up from 3 women on the <a href="https://www.python.org/psf/records/board/history/#id3">2014-2015</a> and 2 women on the <a href="https://www.python.org/psf/records/board/history/#id4">2013-2014</a> term.</p>
<h2 id="conferences">Conferences</h2>
<p>Within the Python-centric conference network, you may have noticed the influx of <a href="https://adainitiative.org/what-we-do/conference-policies/">Code of Conduct</a> <a href="http://geekfeminism.wikia.com/wiki/Conference_anti-harassment/Adoption#Python_conferences">adaptation</a>.</p>
<p>There’s a loud opinion that seems to think CoCs are not needed for conferences. Yet a CoC isn’t for those folks. Those most likely affected by harassment or assaulting behavior are often in the minority of the event, less likely to be visible. There’s even a <a href="http://geekfeminism.wikia.com/wiki/Timeline_of_incidents">timeline of events</a> (that absolutely can not be exhaustive in it of itself) that’s actively maintained - and it will show these folks that incidents do happen. And when they do, we - as a community - need to show those that are affected by harassment or inappropriate behavior that we care and support them.</p>
<p>As of November 2012, The Python Software Foundation passed <a href="https://www.python.org/psf/records/board/resolutions/">a resolution</a> that states it will <strong>only</strong> sponsor conferences that have or agree to create and publish a Code of Conduct/Anti Harassment guide for their conference.</p>
<p>In the past few years, conferences have also been organizing or supporting women-only events, including PyLadies lunches, Django Girls tutorials, women-attendee cocktail hours, and the like. I’ve led many of these events myself. And every single time, I regularly hear praise for being in a room full of awesome women. At the annual PyLadies lunch at PyCon US/North America, I get women to promote their own talks, tutorials, poster sessions, and lightning talks for the conference. Women may be a bit embarrassed to talk about themselves. Having a forum that is explicitly deemed okay - and encouraged - to do so has had a tremendous effect on confidence.</p>
<h2 id="bdfl">BDFL</h2>
<p>You may have noticed Guido’s t-shirt yesterday at his keynote - “Python is for girls”. He wears it proudly so often that I’m surprised there are not more holes and coffee stains on it. Maybe he has a closet full of them, I don’t know (If you do - can I have one?!)</p>
<p>Anyways - having the creator of Python publicly talk about the need for diversity within the community - a community that wouldn’t exist without him - has had a significant impact. I am lucky that he lives in the Bay Area, and have certainly abused his close proximity. I first met him after I cold-emailed him, inviting Guido to help me kick off a weekly Python study group. Of course - he wore that Python is for Girls shirt. Since then, he’s been a regular speaker at our meetups, including the yet-to-be-announced PyLadies CPython sprint coming up.</p>
<p>If you take a look at other tech communities, you can see the lack of support from leadership really affecting the respective community. Take the Linux kernel community. I don’t have to say much as it’s been pretty public - but <a href="http://arstechnica.com/business/2015/01/linus-torvalds-on-why-he-isnt-nice-i-dont-care-about-you/">Linus himself has said</a> “all that [diversity] stuff is just details and not really important.” It’s <a href="http://derstandard.at/2827627/Nat-Friedman-Flamewars-are-part-of-the-community-culture">well-known</a> that flamewars are a part of the community. And it <em>has</em> to affect the diversity <a href="http://geekfeminism.wikia.com/wiki/FLOSS">makeup</a> within linux development: 4.4% of the Ubuntu community are women. 1.8% of the Debian developer community are women.</p>
<p>You can even see lack of leadership support when looking at the Ruby community. In a conversation in response to RubyConf announcing their talk line up and the lack of diversity of speakers, Matz - creator of Ruby - <a href="https://twitter.com/yukihiro_matz/status/380399117047832576">said</a>, “Giving bias to minority does not solve the problem. Just create reverse discrimination.” Ruby-centric conferences have been <a href="https://twitter.com/rubyconf/status/380391363553935360">public</a> about their lack of speaker diversity, one even <a href="http://www.theregister.co.uk/2012/11/20/british_ruby_conference_cancelled_after_diversity_row/">cancelled their conference</a> over it. <a href="http://www.ashedryden.com/">Ashe Dryden</a> - a ruby developer and speaker about diversity in tech - <a href="https://twitter.com/ashedryden/status/455345869207138304">said</a> “I’m continually impressed by the Python community and I’m not even a community member.”</p>
<p>The Ruby community has been doing a lot of work to affect gender diversity within the community, including <a href="http://railsgirls.com/">RailsGirls</a>, <a href="http://www.railsbridge.org/">RailsBridge</a>, and the like. And the fact that these conversations over speaker diversity exist says that the efforts are taking affect. But I can’t help imagine that if Matz was more vocally supportive of the movement, or even if they had a similar organization behind the Ruby language like the PSF supporting diversity movements, it may be a bit better.</p>
<h2 id="pyladies">PyLadies</h2>
<p>Now something that I’m deeply involved with myself: PyLadies. PyLadies started mid-2012 in <a href="http://www.meetup.com/Pyladies-LA/">Los Angeles</a>, with a few women python developers getting together. They essentially said - wouldn’t this be awesome if we did this regularly?</p>
<p>From there, PyLadies expanded to over 70 locations, every continent except Antartica. Each year, we <a href="http://www.pyladies.com/blog/Donate-to-PyLadies-at-PyCon/">raise</a> tens of thousands of dollars to send women to PyCon US/North America. Of the 70, 44 locations are on meetup, of which have over 10,000 members, and hold events like beginners workshops, talk proposal brainstorming, conference speaking preparations, sprints, hack nights, coffee & code nights - lots of events.</p>
<h4 id="i39m-not-a-statistician-but-i-like-to-play-one-on-tv">I’m not a statistician, but I like to play one on TV</h4>
<p>I actually did some data mining of those 44 meetup groups - thanks to the <a href="http://www.meetup.com/meetup_api/">Meetup API</a>.</p>
<p>I was able to get the amount of new PyLadies joining every month -</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/pyladies-new-members-01.jpg" title="PyLadies New Members" alt="PyLadies New Members"/></p>
<p>And my super scientific regression - I just slapped an arrow pointing up - but you can see the trend of the amount of new members joining.</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/pyladies-new-members-02.jpg" title="PyLadies New Members with super scientific regression line" alt="PyLadies New Members with super scientific regression line"/></p>
<p>When taken into context, you can see that the annual PyCon in North America may have some effect on inspiring folks to join PyLadies with those immediate spikes right after.</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/pyladies-new-members-03.jpg" title="PyLadies New Members with PyCons" alt="PyLadies New Members with PyCons"/></p>
<p>And you can see some effect when looking at when the largest 17 chapters (I don’t know why I chose 17) started.</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/pyladies-new-members-04.jpg" title="PyLadies New Members and Chapter openings" alt="PyLadies New Members and Chapter openings"/></p>
<h4 id="the-effect-of-pyladies">The effect of PyLadies</h4>
<p>So while I’m not a statistician, I do know that correlation does not mean causation, and data is fun nonetheless.</p>
<p>One signal is the growth of women speakers at PyCon North America:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/women-speakers-pycon.jpg" title="Growth of Women Speakers at PyCon" alt="Growth of Women Speakers at PyCon"/></p>
<p>For PyCon 2013, PyLadies has lead workshops for women to help brainstorm talk proposals with the help of past program committee members, as well as give them the opportunity to practice their talks before the conference. And we’ve been doing that ever since. I’m sure you can relate to this - perhaps you’ve wanted to propose a talk and thought “ah I have a month, I’ll write it later” and suddenly the deadline has passed. So certainly having a dedicated time to get shit done will help.</p>
<p>But I found that a lot of PyLadies wanted to talk, but didn’t have an idea about what topic to speak on. Or actually, they did have an idea but they didn’t think it was a good topic. Having a sounding board of other women accompanied by people who’ve selected talks for PyCon have really given women the confidence to submit a proposal.</p>
<p>So then the talks get accepted, and we all know once a talk is accepted - everyone goes “oh shit…” - but it’s confidence boosting! To have a group of peers select you and your idea — to allot time to you to hear you speak. It’s brilliant!</p>
<p>So I’d like to think that having that resource for PyLadies has had an effect on the % of speakers at PyCon.</p>
<p>You can also see some effect on more regional level, too.</p>
<h5 id="spotlight-new-york">Spotlight: New York</h5>
<p>So I took a look at three cities that have a large Python presence - New York, Boston, and San Francisco - to see if the community reflect any difference with the addition of a local PyLadies.</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/nyc-pyladies-01.jpg" title="NYC Python Meetup groups" alt="NYC Python Meetup groups"/></p>
<p>This graph shows the number of new members every month for two meetups - NYC Python which started in mid-2006, and Django NYC debuted in late 2009. Certainly, the communities’ growth is attributable to the popularity of Python as a language over all, as well as the growing tech scene within the area.</p>
<p>But when we see when PyLadies started in mid-2012:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/nyc-pyladies-03.jpg" title="NYC Python Meetup groups where PyLadies started" alt="NYC Python Meetup groups where PyLadies started"/></p>
<p>NYC Python and Django NYC meetups saw sharper growth of new members.</p>
<h5 id="spotlight-boston">Spotlight: Boston</h5>
<p>Boston has been a great hub for Python as well. It has a very active Django Meetup and Python User Group:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/boston-pyladies-01.jpg" title="Boston Python Meetup groups" alt="Boston Python Meetup groups"/></p>
<p>However, we don’t see much difference - at least to the Python User Group - in the growth of new members after PyLadies Boston started:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/boston-pyladies-02.jpg" title="Boston Python Meetup groups and PyLadies" alt="Boston Python Meetup groups and PyLadies"/></p>
<p>But perhaps you’re familiar - some of PyLadies inspiration was actually from the women-only workshops that the Boston Python User Group started back in early 2011. You can see the large growth rate once a region introduces women-focused events:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/boston-pyladies-03.jpg" title="Boston Python Meetup groups and women's workshops" alt="Boston Python Meetup groups and women's workshops"/></p>
<h5 id="spotlight-san-francisco">Spotlight: San Francisco</h5>
<p>Onto San Francisco - home of the largest PyLadies location, the one that I lead as well. We have a bunch of Python-centric meetup groups, and I chose the most active/largest ones:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/sf-pyladies-01.jpg" title="SF Python Meetup groups" alt="SF Python Meetup groups"/></p>
<p>The growth rate of the membership in these meetup groups is pretty noticeable when SF PyLadies started in April 2012, Again with my super scientific arrows I just slapped on:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/sf-pyladies-03.jpg" title="SF Python Meetup groups" alt="SF Python Meetup groups"/></p>
<p>What’s interesting, when switching this to a line graph, you can see when PyLadies SF started, that the rate of new membership for the SF Python Pub Night meetup was not at all affected:</p>
<p><img class="img-responsive img-rounded img-displayed" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/sf-pyladies-06.jpg" title="SF Python Meetup groups" alt="SF Python Meetup groups"/></p>
<p>I suspect it has something to do with the presence of alcohol and/or the environment that may not attract many PyLadies.</p>
<p>Anyways - this was just me trying to quantify the regional effect of PyLadies on local meetups. <a href="https://github.com/econchick/pyladies-data">Here are the IPython Notebooks</a> to show the process of how I got the data.</p>
<h1 id="what-we39re-missing">What we’re missing</h1>
<p>Ok so we’re doing pretty good, don’t you think? We’re not done, though. There’s so much more to do, and just throwing money at PyLadies to host events is not enough. (but please - <a href="http://www.pyladies.com/sponsor/">continue giving us money</a>!)</p>
<h2 id="what-is-said-vs-what-is-meant">What is said vs what is meant</h2>
<p>A lot of people - recruiters and developers alike - come to me to complain about not hiring women, or about the lack of their corporate diversity in general.</p>
<p>I’ve heard the same excuses all the time when trying to hire more diversely. “I couldn’t find them” or “we’re a meritocracy, gender doesn’t matter!”.</p>
<p>I’m going to introduce a scientific term, maybe you’ve heard of it - “bullshit”. These excuses are bullshit. Let me show you why.</p>
<h3 id="bullshit-excuse-1">Bullshit excuse #1</h3>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>Gender equality – that’s not a problem here! Those things don’t matter!</p>
</blockquote>
<p><span class="bullshit">what this translates to:</span></p>
<blockquote>
<p>That is not a problem <em>to me</em>.</p>
</blockquote>
<p>This shouldn’t be said anymore. It essentially questions the person’s beliefs. If it’s someone’s concern to increase the gender ratio, it should be treated as a legitimate problem. If you don’t think it’s a problem, then ask questions. “Oh - really? how so? what do you think should be done?” Try to understand why it’s a problem, because - why would anything be said to begin with?</p>
<h3 id="bullshit-excuse-2">Bullshit excuse #2</h3>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>We focus on quality, not gender!</p>
</blockquote>
<p>Also heard as “We focus on quality, not gender!” or similarly “we’re a meritocracy!”</p>
<p><span class="bullshit">what this translates to:</span></p>
<blockquote>
<p>Quality apparently means software written by men.</p>
</blockquote>
<p>That’s bullshit! What you’re saying to me is that you find quality to mean software written by men.</p>
<p>But quality is not an objective word. Do we encourage uniqueness? or rawness? or authenticity? sensitivity? what other values to we add to “quality”? This excuse is essentially saying “we don’t want to change what we’re doing here”.</p>
<h3 id="bullshit-excuse-3">Bullshit excuse #3</h3>
<p>This can be translated to girls, too:</p>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>Women aren’t interested in this.</p>
</blockquote>
<p><span class="bullshit">what this translates to:</span></p>
<p>It’s their own fault.</p>
<p>This is implying that this is women’s fault. Are women <strong>really not</strong> that interested? That’s complete bullshit.</p>
<p>At Spotify, we participated in an event called <a href="http://www.techtimes.com/articles/41536/20150324/swedish-musician-robyn-launches-tekla-a-festival-for-girls-interested-in-tech.htm">Tekla</a>, meant for secondary-school girls where sponsoring companies held workshops to give a sampling of what the future has to offer in terms of technology. It had robots, computers, gaming, 3d printing, all awesomely geeky things. It proved that this is something girls are interested in, as long as they are invited. PyLadies itself is proof of that too - it provides an invitation for women to join the Python community.</p>
<h3 id="bullshit-excuse-4">Bullshit excuse #4</h3>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>We couldn’t find any women.</p>
</blockquote>
<p><span class="bullshit">what this translates to:</span></p>
<blockquote>
<p>I didn’t want to put much effort in.</p>
</blockquote>
<p>I put out a single job ad for Spotify once to my local PyLadies mailing list. I got 40 responses from women. 40! I’m not sure there are more than 200 on that mailing list. That’s a super awesome response rate.</p>
<p>So I challenge folks to take a look at your own professional network. For instance, on LinkedIn, how diverse are your connections? How many look like you?</p>
<p>If you reach out to the same network of yours, you’re going to get the same people applying. Yes, it will take work. But “the womenz” are there.</p>
<h3 id="bullshit-excuse-5">Bullshit excuse #5</h3>
<p>I see this kind of thought on the twitter-sphere and on Reddit a lot:</p>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>Quotas are bad! That’s reverse discrimination.</p>
</blockquote>
<p><span class="bullshit">what this translates to:</span></p>
<blockquote>
<p>I want to recruit my (male) friends so I can get my referral bonus.</p>
</blockquote>
<p>What this sounds like to me is that you just want to recruit your friends - who I’m sure look like you.</p>
<p>It doesn’t help that we have a referral bonus culture that encourages us to hire our friends. I get the reason - hire good people like you. But it has consequences.</p>
<p>So what if we turned it around: “let’s be sure to hire 90% white men!”</p>
<p>There’s incorrect thinking that there is some sacrifice that will be made; that you are lowering the standards. It’s bullshit. I just quoted a bunch of research essentially saying that diversity increases quality. There have been <a href="https://scholar.google.es/scholar?q=resume+gender+bias&hl=en&as_sdt=0&as_vis=1&oi=scholart&sa=X&ved=0CB4QgQMwAGoVChMIp-G1neTwxgIVRXEUCh0DWQSI">many</a> studies that if you strip away any gender identification from resumes, more women get further along in the recruitment process.</p>
<p>You make sure that <em>everyone</em> that is qualified is being considered.</p>
<p>Lowering standards - ugh - bullshit.</p>
<h3 id="bullshit-excuse-6">Bullshit excuse #6</h3>
<p><span class="bullshit">what i hear:</span></p>
<blockquote>
<p>We were in a hurry.</p>
</blockquote>
<p>Oh goodness - yes - the tech industry is hiring like crazy!</p>
<p><span class="bullshit">what this translates to:</span></p>
<blockquote>
<p>We don’t care enough to put thought into our process.</p>
</blockquote>
<p>What this means is that you don’t actually care enough to put thought in about diversity into your hiring process. It’s as simple as that. Do the work once to ensure that you have appropriate practices in place, diverse networks, maybe remove gender-identifying words in your application process; it certainly doesn’t take much effort to google for local communities to reach out.</p>
<h2 id="the-bad-guy-fallacy">The “Bad Guy” Fallacy</h2>
<p>This is all thinly-veiled bullshit. I’m sure I’m not the only one to pick up on it either.</p>
<p>And these excuses hint at a larger issue at what I’ve deemed the “bad guy” fallacy.</p>
<p>Certainly, <strong>you</strong> get why diversity is important. You’re trying, for goodness sakes! (poorly, by the way – see above).</p>
<p>But there’s a notion that there is a “bad guy” behind the lack of diversity in tech. A lot of attention given to someone who says that something bad, and everyone says “ah! there’s the bad guy! get him!”</p>
<p>We didn’t get here because of one “bad guy”. We’re all the “bad guy” – we’re all complicit. We hire our friends, we have a very uniform-looking network. This is something that everyone has to consciously and actively work towards.</p>
<h1 id="well-shit.-what-should-we-do">Well, shit. What should we do?</h1>
<p>Ok so there’s a lot of bullshit. Perhaps you agree on the thinly-veiled bullshit, and that we need to do more. We need to stop being complicit. So what can we do?</p>
<h2 id="let-me-google-that-for-you">let me google that for you</h2>
<p>ok folks - have you heard of Google? great? okay. use it.</p>
<p>Numerous amount of times, people come to me to educate them. I am not your teacher. If you want to learn more about feminism, unconscious bias, about privilege, that’s super fantastic! Thumbs up! But I’m not going to do your research for you. I’m not going to give you the TL;DR. I ain’t got time for that - nor the patience for the debates that usually come with it.</p>
<p>I am however - <em>this one time</em> - going to provide you with a set of readings to start you off. Bookmark it: <a href="http://www.roguelynn.com/lmgtfy/">rogue.ly/lmgtfy</a>.</p>
<h2 id="micro-actions">micro-actions</h2>
<p>Alright - after some self-education and some googling, there are micro actions that you can take:</p>
<h3 id="sfemalewomeng">s/female/women/g</h3>
<p>First is super simple - programmers should like it. Switch any use of the word “female” to the word “women”. For example “female attendees” → “women attendees”.</p>
<p>Why, you may ask?</p>
<p>The primary meaning of “female” can be seen more as a term for classification, like studying female lizards:</p>
<blockquote>
<p>Female: Of or denoting the sex that can bear offspring or produce eggs, distinguished biologically by the production of gametes (ova) which can be fertilized by male gametes
<a href="http://www.oxforddictionaries.com/definition/english/female">Oxford English Dictionary</a></p>
</blockquote>
<p>The term “woman” <a href="http://www.oxforddictionaries.com/definition/english/woman">refers specifically</a> to a human, while “female” could refer to any species.</p>
<p>Second reason is it’s dehumanizing. To be reduced to my reproductive abilities, you ignore that I am a human.</p>
<p>Last, when used as a noun, it can imply inferiority. It’s often used in a negative tone. For example, simple search on Twitter, I found:</p>
<blockquote class="twitter-tweet" lang="en" style="display:block;margin-left:auto;margin-right:auto;"><p lang="en" dir="ltr">A lot of females should stop wearing makeup so we all know the truth</p>— MJ The Astronaut™ (@MJChillin) <a href="https://twitter.com/MJChillin/status/623893740944498688">July 22, 2015</a></blockquote>
<blockquote class="twitter-tweet" lang="en" style="display:block;margin-left:auto;margin-right:auto;"><p lang="en" dir="ltr">females these days are to full of themselves confusing confidence with arrogance. where’s the humble ones at? 👀</p>— Cogito Érgo Sum (@KXavierr) <a href="https://twitter.com/KXavierr/status/624493556548268032">July 24, 2015</a></blockquote>
<blockquote class="twitter-tweet" lang="en" style="display:block;margin-left:auto;margin-right:auto;"><p lang="en" dir="ltr">I don’t f with these new generation of females.</p>— IG: @emanhudson (@_kosher) <a href="https://twitter.com/_kosher/status/624072746004131840">July 23, 2015</a></blockquote>
<p>With the use of the word “females”, we are reduced to a species; we’re separate, we’re other-ed.</p>
<p>Search for the word “females” on <a href="https://twitter.com/search?q=females">Twitter</a> and you’ll see what I mean. Maybe get a drink first.</p>
<h3 id="sfemalewomeng-2"><strike>s/female/women/g</strike></h3>
<p>But along those lines: really <em>think</em> about using a gendered identifier. More often than not, you probably don’t need to.</p>
<p>For instance, at work, I get introduced a lot as “this is Lynn, our female developer”. There’s no need. There is already a word for what I am:</p>
<p><strong>I’m a fucking engineer.</strong></p>
<p>You can call me crappy, lazy, stubborn - whatever. But please – not a “female” or a “woman” developer.</p>
<p>Only when gender is really relevant should you specify it. Like an event specifically for women.</p>
<h3 id="assume-knowledge">Assume Knowledge</h3>
<p>Another micro action is to assume knowledge. Assume everyone has a reason to be at this conference, at this meetup, at this workshop, whatever. Assume that they are not a +1 of someone else, not a beginner, not a recruiter. Allow them to reveal whatever it is they want to, but assume the reason why they are there is the <strong>same as yours</strong>.</p>
<h3 id="women-first-design">Women-first Design</h3>
<p>This probably would sound better if it was “female-first” design for the alliteration - but there’s no reason for it to be other than the alliteration.</p>
<p><img class="img-responsive img-rounded img-displayed" style="width:300px" src="http://www.roguelynn.com/assets/images/diversity-not-done-yet/women-restroom.png" title="Women restroom sign" alt="Women restroom sign"/></p>
<p>Women-first design: Make the default pronouns and imagery reflect a woman. Documentation, products, user profiles, form values, whatever.</p>
<p>The reason that this is so impactful is that it’s <strong>signaling</strong>. It signals to women that - as a developer - you’ve thought about them.</p>
<p>It does not mean patronizing women by painting flowers across your product or adding pink everywhere.</p>
<p>But I doubt that Apple considered how women would react when they first introduced the “iPad”.</p>
<h3 id="reach-out">Reach Out</h3>
<p>The last micro action - this one’s novel I know - but it’s to reach out.</p>
<p>A couple of months ago, <a href="https://twitter.com/dstufft">Donald Stufft</a>, one of the maintainers of PyPI, pinged me on IRC. He needed help with PyPI - both maintenance &bug fixing, as well as more greenfield projects.</p>
<p>And he actually had a thought: he recognized there were no women that help PyPI behind the scenes.</p>
<p>So - get this - he asked us! He asked PyLadies to help alleviate his workload. To be honest, Donald didn’t think he’d get many takers. But - to his disbelief (and I knew it would happen) - he got 4 volunteers within the first hour of the email.</p>
<p>This shit actually works! mind blowing, I know.</p>
<h2 id="the-difficult-shit">the difficult shit</h2>
<p>Micro actions are for the low hanging fruit - small actions we can all do to really help welcome women into Python.</p>
<p>But now onto the more difficult stuff.</p>
<h3 id="comfortable-in-the-uncomfortable">Comfortable in the Uncomfortable</h3>
<p>It’s very comfortable hiring, working, and hanging out, or co-founding business with people like you. But we must be prepared to get uncomfortable. One way is to take on a sense of “intentional curiosity”.</p>
<p>Ok so remember the <a href="#let-me-google-that-for-you">let-me-google-that-for-you</a> bit? While I find that annoying, it is indeed commendable. To want to know more about something – that’s fantastic. But it actually takes work. Again, I’m not going to do that for you, but self-education is the key.</p>
<p>One thing that is admirable is a sense of curiosity. Curiosity when meeting people not like oneself within social & workspace. Curiosity about what makes a team work well together, how to make it better. It’s essentially going beyond your comfort zone.</p>
<p>Something that technology companies have been super good at being introspective. It’s embarrassing and uncomfortable to admit mistakes and fuckups. But if a service is down, a company is often very transparent and apologetic about it. We tend to share what we learn. </p>
<p>But there isn’t the same level of transparency among the lack of diversity. We should have post-mortems on the subject - like, “here are our numbers from 2010, here’s what we’re doing, where we are now, etc.” We need to reflect on diversity as a similar problem as when a service go down – our “fail whale” – and then document it for the world. Open source our diversity.</p>
<p>It’s scary! It’s admitting to fault, it’s uncomfortable! but wouldn’t it be awesome to see that?</p>
<h3 id="say-what-you-think">Say What You Think</h3>
<p>Women: if you are the only woman in the room, say what you really think. Seriously.</p>
<p>I spent a few meetings noticing how often I would be interrupted, unacknowledged, or talked over. God damn it really made me mad!</p>
<p>One meeting it was particularly aggravating, I was supposed to be leading the fucking meeting. And I just got fed up. I literally said - “for fuck sake, let me fucking speak!” It felt good.</p>
<p>Seriously – say what you think. Say what you <em>really fucking</em> think. Rage quit the meeting if you need to. Because otherwise, it will continue; <em>it will get worse</em>.</p>
<p>Your company may be super into diversity with hiring, but if you don’t have a fucking voice at the table, <em><strong>what is the point</strong></em>.</p>
<h1 id="back-to-star-trek">Back to Star Trek</h1>
<p>I’ll finish with yet another Star Trek quote – because there are so many awesome ones and it’s a bit more cheery than my above rage:</p>
<blockquote>
<p>…humanity will reach maturity and wisdom on the day that it begins not just to tolerate, but take a <strong>special delight</strong> in differences in ideas and differences in life forms.</p>
<p>…the <strong>worst possible thing</strong> that can happen to all of us is for the future to somehow <strong>press us into a common mould</strong>, where we begin to act and talk and look and think <strong>alike</strong>.</p>
<p>If we cannot <strong>learn</strong> to actually enjoy those small differences, to take a <strong>positive delight</strong> in those small differences between our own kind here on this planet, then we <strong>do not</strong> deserve to go out into space and meet the diversity that is almost certainly out there.</p>
<p><small>Gene Roddenberry, <a href="http://web.archive.org/web/20140421220157/http://www.niatu.net/transfictiontrek/download/gene-roddenberry-st-philosophy.pdf">The Star Trek Philosophy</a></small></p>
</blockquote>
Metrics-Driven Development: See the forest for the treeshttp://www.roguelynn.com/words/metrics-driven-development/2015-05-29T09:46:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This post is an accompaniment to my <a href="http://opendatascicon.com/schedule/metric-driven-development-see-the-forest-for-the-trees/">Metrics-driven development talk</a> at <a href="http://opendatascicon.com/">Open Data Science Conference</a> in Boston in May 2015. Slides <a href="https://speakerdeck.com/roguelynn/metrics-driven-development-see-the-forest-for-the-trees">here</a> and video to be posted soon.</p>
<hr>
<p>At Spotify, data is quite important. We track user-generated data, like sign ups, logins, activity within the application, even tweets (good and bad), etc. We also track server-generated data, including requests to various services, response times, and response status codes, among a million other things.</p>
<p>Each squad owns what they want to collect, how and when, and how they will consume such data. We have analysts that run thousands of Hadoop jobs a day to glean insight from user activity, answering questions like “how many paying subscribers to we have at this moment?”, or “was this partnership financially beneficial for us?”.</p>
<p>We have data scientists and machine learning engineers analyzing listening behavior and trends that power the Discovery, Browse, and Radio behind the platform.</p>
<p>Engineers behind the platform watch usage rates of our Web APIs, login failure rates, and feature usage. This only scratches the surface of what data we collect.</p>
<p>We use various technologies related to data, including Hadoop, as well as Cassandra, Postgres, and <a href="https://www.elastic.co/">Elasticsearch</a>. All of the user-generated data sits in Hadoop, with which we run jobs against using either Java, Python, or directly query with Hive (side note: we’ve open-sourced our <a href="https://github.com/spotify/luigi">Python job-scheduler framework</a>!). I’ve even discovered we have an IPython notebook server setup.</p>
<p>For some devops events, like DNS changes, Puppet configuration changes, and deploy pipelines, get parked in Elasticsearch, where we have <a href="https://www.elastic.co/products/kibana">Kibana</a> setup. But the majority of service activity is handled by a home-grown system, which includes our open-sourced <a href="https://github.com/spotify/ffwd">ffwd</a> (pronounced “fast forward”) written in Ruby.</p>
<p>Yet with all this setup, all this technology, I am embarrassed to say my team did a lot of development in the dark. We were not tracking anything; we didn’t know how successful our feature integrations were doing; we hadn’t a clue how our backend services we “maintained” were holding up.</p>
<p>This is a story of “self-discovery” to become a better, more effective team. And we did this by capitalizing on understanding our own data. Not everyone can be data scientists, statisticians, econometricians; but everyone can grasp why it’s important that more than half of users can’t log in. This is a story of a practical application of data science.</p>
<h2 id="the-agile-approach">The Agile Approach</h2>
<p>Spotify has been very public with <a href="https://www.youtube.com/watch?v=Mpsn3WaI_4k">how</a> <a href="https://www.youtube.com/watch?v=X3rGdmoTjDc">it</a> <a href="http://techcrunch.com/2012/11/17/heres-how-spotify-scales-up-and-stays-agile-it-runs-squads-like-lean-startups/">uses</a> <a href="http://agilemanifesto.org/">Agile</a> in its software development process. One key aspect of agile is iteration, and we certainly iterate over our product. But we also iterate over ourselves, trying to find what works best for us as a company, as a squad, and everything in between.</p>
<p>Late last year, my <a href="http://cdn.tshirtonomy.com/wp-content/uploads/Unishark-T-Shirt.jpg">squad</a> began participating in an internal program deemed the “Effective Squad Circle.” Its purpose was to hone-in on the squad itself. There were monthly challenges set up to figure out the team’s current condition and comparing it the desired condition terms of delivering the product/feature/service we were meant to.</p>
<h3 id="finding-our-target-condition">Finding Our Target Condition</h3>
<p>The first challenge was to find our target condition. Where do we want to be? It’s certainly difficult to establish a goal without context, without an understanding of where we are now. To figure out our baseline, we sat down to answer a few questions as a group.</p>
<p class="lead">Question 1: What do we deliver?</p>
<p>A seemingly easy question, right? Yet myself and the squad initially struggled to answer this right away. It certainly wasn’t immediately on the tip of our tongues.</p>
<p>So we looked at our past and listed out the integration projects we delivered and the services currently maintain. It includes <a href="https://get.uber.com/spotify/">Uber</a>, <a href="http://www.engadget.com/2014/01/29/spotify-last-fm-partnership/">Last.fm</a>, <a href="https://investor.yahoo.net/releasedetail.cfm?releaseid=686833">Yahoo!</a>, <a href="https://news.spotify.com/us/2014/02/21/add-to-spotify-with-soundhound/">SoundHound</a>, <a href="http://www.digitaltrends.com/social-media/twitter-revisits-music-with-an-integrated-spotify-app/">Twitter #music</a>, among others. The most critical is certainly our <a href="https://news.spotify.com/us/2011/09/21/spotify-and-facebook/">Facebook</a> login and new user registration as more than 50% of our user base has a Facebook-connected account.</p>
<p><small>Side note: there seems to be a misconception that one must sign up/log in via Facebook to use Spotify. <a href="https://www.spotify.com/us/signup/">Not true</a>!</small></p>
<p class="lead"> Question 2: Who are our customers?</p>
<p>Who actually defines our work? At Spotify, we believe the leadership is meant to convey a vision, and the squad is meant to implement that vision in the matter that they choose. There isn’t micromanagement; a lot of trust actually. But our lead team defines the direction our squad takes.</p>
<p>With the many integrations we’ve done, we have a lot of external partners. Thankfully, the squad is a bit shielded from direct communication. But that makes our business development team another customer.</p>
<p>But then who depends on us internally? And who actually uses our work/product/service? As I alluded to earlier, many users log in to Spotify via Facebook. It’s a pretty integral system to the Spotify platform. So we certainly have to not f*ck it up when Facebook makes breaking changes – announced or not – to their login protocol. There’s also other teams within Spotify that plug into the system for social aspects, e.g. sharing from within the platform.</p>
<p class="lead">Question 3: What are their expectations?</p>
<p>When trying to answer this question, it occurred to us that we never really asked our customers what their expectations are. So we did! We wanted to know what exactly was important to them with what we deliver. Was it on-time delivery? Predictable versus productive? Do they expect solutions to problems they didn’t know existed? What were their expectations on quality, usability, and other non-measurables? Were there expectations with how we worked as a squad; did they want updates on progress, problems, etc.?</p>
<p>We couldn’t ask all our customers; asking 60 million users would be a bit much. And expectations would be different for different customers. Internal teams expected our Facebook service to be reliable and scalable. Business development wanted us to be clear on what we can feasibly implement. It’s safe to assume users will want to log in or sign up via Facebook if they choose to, and for it to just work.</p>
<p class="lead">Question 4: Do we actually meet them?</p>
<p>How do we know we’ve met our customers’ expectations? This is where we stopped dead in our tracks. No, we didn’t know if our systems could handle extra internal load. Or if/when users couldn’t log in. Or how many users have activated Spotify with Uber, and of those, does the experience actually work?</p>
<p>Being people that have an affinity for tech and automation, naturally we wanted to implement a technical solution.</p>
<h2 id="implementing-feedback-loops">Implementing Feedback Loops</h2>
<p>A “feedback loop” is a generic term that any team – not just tech – can use to understand how feedback is given. For our squad, one of the main feedback loops we chose was metrics. We wanted all them snazzy looking dashboards! With eye-candy graphs and visuals using the latest technology that will be obsolete tomorrow.</p>
<p>In all seriousness, we wanted an immediate visual of various metrics. But what did we want to see? What questions did we want to answer?</p>
<h3 id="measurements-we-wanted-to-see">Measurements We Wanted to See</h3>
<p>In line with the idiom, to throw spaghetti on the wall to “see what sticks”, the squad brainstormed for a while, trying to come up with any question for which we’d like to see the answer. Some ideas included:</p>
<ul>
<li>Signup/auth flow abandonment</li>
<li>Facebook-connected users – percentage of total users, trend over time</li>
<li>Percent of users that sign up through Facebook per hour/day/week</li>
<li>Facebook-related errors</li>
<li>Daily Active Users by Partner/Feature</li>
<li>Registrations, subscription rate, and referrals by Partner</li>
<li>Web API usage by Partner</li>
<li>Squad-focused Twitter feed (“uber + spotify,” etc.) – what’s being complained about that neither the partner or we may not see?</li>
<li>Outstanding JIRA issues</li>
<li>Request count by internal requesting service/team</li>
</ul>
<p>We grouped together similar metrics and questions into buckets: Usage, System Health, and Business Performance Indicators. These buckets will eventually be their own dashboard cycled through one of our big office monitors.</p>
<p>We also created a few processes based on the questions above. One process reviews our progress as a squad. Every <a href="https://www.mountaingoatsoftware.com/agile/scrum/sprint-retrospective">retrospective</a>, we will look at a couple of metrics that deals with the squad performance, e.g. how many bugs we closed in the past sprint period. We will also judge if this is a metric we’d like to continue seeing, if we can actively improve upon it (if we understand what needs to be improved), and what new – if any – measurable items we should look at for next retro.</p>
<p>Another is to have goal targets at the start of every integration project we do (which may span multiple sprints). For example, “we will know we’re successful when this integration brings us x-amount of users.” It’s true this sort of goal line can only be judged based on historical user acquisition numbers, so we definitely have some work to do beforehand. It will also feed into our retrospectives.</p>
<p>We also have a few post-integration questions for business development folks to ask of external partners on behalf of the squad. These questions include understanding our responsiveness, how are developer tools are, and if their company goals were met. We may think an integration was super successful, but they may have some insight that we do not.</p>
<h2 id="the-big-picture">The Big Picture</h2>
<p>We’ve only been “caring” about metrics for the past few months. So this is certainly only the beginning for us. But it’s allowing us to iterate and give a hard look at what we track and why. You can certainly track <a href="https://codeascraft.com/2011/02/15/measure-anything-measure-everything/">everything that moves</a>, but will you get innundated? Certainly so if you’re counting each leaf of each branch for every tree in the forest. So how can we tell what’s important?</p>
<p>This goes back to understanding your customers’ expectations, and essentially boils down to business value. How can you maintain and improve upon the business value of your service/product? How does counting every Facebook-connected user help us better ourselves?</p>
<p>When thinking about implementing various metrics for our feedback loops, I came across various questions to help me see the forest for the trees:</p>
<p class="lead">Creating a new metric</p>
<ul>
<li>How do metrics map to business goals?</li>
<li>How do you prioritize different goals you want to drive? which is most important? Does it mean you’re going to neglect the others? or allot time by priority?</li>
<li>How can we create dashboards that are actually actionable? What is the goal, and more importantly, <em>how</em> can we drive the goal?</li>
</ul>
<blockquote style="display:block;margin-left:auto;margin-right:auto;" class="twitter-tweet" lang="en"><p lang="en" dir="ltr">Switching my Spotify to private so that my Facebook friends can’t see me listening to Clay Aiken.</p>— BriiMonster (@BriiMonster) <a href="https://twitter.com/BriiMonster/status/604210734621282304">May 29, 2015</a></blockquote>
<p><figcaption>Putting a Twitter feed of “Spotify + Facebook” is certainly turning into noise.</figcaption></p>
<p class="lead">Representing metrics</p>
<ul>
<li>How do we correctly measure what we care about?</li>
<li>We have many tools to help us create <span class="rogue-hover" data-toggle="tooltip" data-placement="top" title="# of registered users right now">gauges</span>, <span class="rogue-hover" data-toggle="tooltip" data-placement="top" title="# of open connections">counters</span>, <span class="rogue-hover" data-toggle="tooltip" data-placement="top" title="# of requests/second">meters</span>, <span class="rogue-hover" data-toggle="tooltip" data-placement="top" title="distribution of the # of registered users connected with Facebook">histograms</span>, and <span class="rogue-hover" data-toggle="tooltip" data-placement="top" title="latency: # of requests/sec over time">timers</span>. But what representation is best for each question?</li>
</ul>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/metrics-driven-dev/metrics_facebook_service.png" title="Incoming requests by service" alt="Incoming requests by service"/>
<figcaption>Another metric displayed on our dashboard</figcaption></p>
<p class="lead">Consuming metrics</p>
<ul>
<li>How often do you check in on metrics?</li>
<li>Dashboards are never looked at; they become background noise. How do you make dashboards more visible?</li>
<li>If you make them more visible by slapping them up on a TV monitor, are the metrics too sensitive to broadcast (e.g. where vistors can see)</li>
</ul>
<p class="lead">Iterating over current metrics</p>
<ul>
<li>For the things we don’t reach 100% of our goals (the gap between baseline and goal line) need to assess the difference – why does it exist? is it even solvable?</li>
<li>If you look at the dashboard, what actions are you actually going to take? Should you even create a dashboard if a goal or an alert isn’t set up? (probably not)</li>
<li>What about the unknowns? What <em>is</em> unknown? e.g. we know <code>x</code>-amount of mobile users have connected their accounts to Uber; but how many don’t use it because of the driver has an Android phone versus driver isn’t aware of the service?</li>
<li>How to approach the known unknowns? Are there different ways or avenues to track? or is it even actionable?</li>
<li>You’re then left with the unknown unknowns; how to you figure out the % of known knowns, known unknowns, and unknown unknowns? What level of known and unknown unknowns are you comfortable with?</li>
</ul>
<h2 id="tldr">TL;DR</h2>
<p>Ultimately, the goal in us answering these questions is to give us both a shorten decision-making cycle as well as make more informed decisions about strategy and partnerships. It’s super easy to get lost in the forest. It doesn’t help that it’s kind of fun to get all that instant feedback. We are placing current values in historical context in order to see patterns developing.</p>
<h2 id="moar-resources">MOAR Resources</h2>
<p>Once you’ve thoughtfully address what you want to measure, take a look at the following:</p>
<ul>
<li><a href="https://hynek.me/talks/beyond-grep/">Beyond Grep: Practical Logging and Metrics</a> by Hynek Schlawack – a practical and very thorough guide in setting up proper error notifications; metrics tracking, collecting/aggregating, and storing/viewing; and centralize logging.</li>
<li><a href="https://www.youtube.com/watch?v=czes-oa0yik">Metrics, Metrics Everywhere</a> presentation from Coda Hale – making decisions based off of metrics to avoid confusion and alleviate the unknowns.</li>
</ul>
<script>
$(function () {
$('[data-toggle="tooltip"]').tooltip()
})
</script>
RAMLfications – Python package to parse RAMLhttp://www.roguelynn.com/words/ramlfications-release/2015-04-21T09:38:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p class="lead"><i>Update</i>: I gave a presentation to the <a href="http://www.meetup.com/sfpython/events/222323217/" alt="SF Python">SF Python</a> Meetup group about the new library, including why you’d use a descriptive language for your API, why RAML, and why Spotify chose RAML.
If you’re into IPython Notebooks, mine can be found <a href="https://github.com/econchick/raml-ipynb/blob/master/ramlfications.ipynb" alt="IPython Notebook">here</a>, with <a href="http://ipython.org/ipython-doc/1/interactive/nbconvert.html" alt="IPython nbconvert docs">slides</a> generated from IPython <a href="http://ramlfications-sf.herokuapp.com" alt="talk slides">here</a>.
</p>
<hr>
<p>A few of us at Spotify are infatuated with <a href="http://raml.org/">RAML</a> - a RESTful API Modeling Language described as “a simple and succinct way of describing practically-RESTful APIs”, extremely similar goal of <a href="http://swagger.io/">Swagger</a>.</p>
<p>I’m pleased to announce the initial release of <a href="http://www.roguelynn.com/projects/ramlfications">RAMLfications</a>, a Python package that parses RAML and validates it based on the <a href="http://raml.org/spec.html">specification</a> into Python objects.</p>
<pre><code data-lang="python">>>> from ramlfications import parse
>>> RAML_FILE = "/path/to/some/file.raml"
>>> api = parse(RAML_FILE)
>>> api.base_uri
'https://{subdomain}.example.com/v1/{communityPath}'
>>> api.resources
[ResourceNode(method='get', path='/widgets'),
ResourceNode(method='get', path='/widgets/{id}'),
ResourceNode(method='get', path='/widgets/{id}/gizmos'),
ResourceNode(method='get', path='/thingys')]
</code></pre><pre><code data-lang="python">>>> widget = api.resources[1]
>>> widget.name
'/{id}'
>>> widget.description
[Get a Widget](https://developer.example.com/widgets/)
>>> widget.description.html
u'<p><a href="https://developer.example.com/widgets/">Get a Widget</a></p>\n'
>>> widget.uri_params
[URIParameter(name='id'), URIParameter(name='communityPath')]
</code></pre>
<p>It’s available on <a href="https://pypi.python.org/pypi/ramlfications">PyPI</a> with documentation on <a href="https://ramlfications.readthedocs.org">Read the Docs</a> and code released under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache 2.0</a> license available on Spotify’s <a href="https://github.com/spotify/ramlfications">GitHub</a>.</p>
<p>There are still some <a href="https://github.com/spotify/ramlfications/issues?q=is%3Aopen+is%3Aissue+label%3Afeature">features that need to be implemented</a>, and I am sure there are some bugs (please <a href="https://github.com/spotify/ramlfications/issues">report</a> them!), hence the initial beta release.</p>
<h3 id="why">Why?</h3>
<p>Last year, I built our developer <a href="https://developer.spotify.com/web-api/console">API console</a>, allowing folks a playground for understanding our <a href="https://developer.spotify.com/web-api/">Web APIs</a>. The console first parses a RAML file that defines the API, then creates a set of forms based off of RAML, allowing a user-friendly way to directly interact with our Web API service.</p>
<p>One of the highlights of this console is the fact that <em>none</em> of the application itself (except for the HTML/CSS) is Spotify specific. This allows others to easily maintain the app, only editing our RAML file then restarting the service.</p>
<h3 id="what39s-next">What’s next</h3>
<p>I have plans to open source our API console that uses RAMLfications so others can easily create their own interactive environment.</p>
<p>My next hack project is to also write a documentation generator based off of RAML (of course, using <code>ramlfications</code>). Currently, RAML supports Markdown (and plaintext) for <code>description</code> and <code>documentation</code> elements within a RAML file. So using <code>ramlfications</code>, I’ll probably end up hacking on top of a static site generator. Stay tuned!</p>
I’m a fraud.http://www.roguelynn.com/words/Im-faking-it/2014-10-14T09:46:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>In celebration for <a href="http://findingada.com/">Ada Lovelace Day</a>, <a href="http://www.thoughtworks.com/">Thoughtworks</a> hosted a meetup featuring talks from many prominent local women. I had the pleasure of speaking myself, in which I confessed publically:</p>
<p>I am totally faking it.</p>
<h2 id="im-not-a-real-coder">I’m not a “real” coder</h2>
<p>The tl;dr of my <a href="http://www.roguelynn.com/words/my-path-into-engineering/">path</a> in life is that I am not a “real” coder. I graduated college in 2008 with a degree in business focusing on econ and finance and went straight into banking as a <a href="https://www.bostonprivatebank.com/">treasury analyst</a>. I was a bit restless and wanted to get an advance degree, eventually making my way through the industry to the Board of the <a href="http://www.federalreserve.gov/">Federal Reserve Bank</a>, who makes economic policy decisions for the US.</p>
<p>So I decided to set my sights for a <a href="http://www.haas.berkeley.edu/MFE/">Financial Engineering masters degree</a> at UC Berkeley. But! Applicants had to prove they knew how to code in C, which I didn’t. Therefore the most suitable solution I thought was to take an intro to Computer Science course for credit. In the fall of 2011, I enrolled in the <a href="https://cs50.harvard.edu/">CS50</a> course at <a href="http://www.extension.harvard.edu/">Harvard</a>, thinking that it’s only been a few years since I graduated college, and therefore knew how to study.</p>
<p>But no.</p>
<p>I failed. I failed both the mid terms. And not like “a B isn’t an A” sort of fail. But “here’s a D, you should be thankful for the grading curve.”</p>
<p>I could not understand concepts that are core to programming, like memory allocation, pointers, and dereferencing. I knew that sorting algorithms existed, but how to implement a simple Insertion Sort in C? Instant deer-in-the-headlights.</p>
<p>So for the course’s final project, I decided to make a website built using Python. I chose Python because I saw my boyfriend writing it and thought “woah, I can actually read it, understand it!” It was a little app - with really awful and repetitive code - that calculated a user’s personalized inflation rate over 10 years. For example, based on income and purchase habits, a retiree experiences inflation different than one who is in a marriage with 3 kids, or a college kid.</p>
<p>So because I made a <em>working</em> site, and learned Python & Django for it, I finished the course in December 2011 with an A-. Not sure how that computes with two failed mid-terms, but I’ll take it.</p>
<h2 id="unconventionally-educating-myself-at-the-expense-of-others">Unconventionally educating myself at the expense of others</h2>
<p>I was hooked though - I want to code more. Screw finance. Screw getting a master’s degree. 3am never looked better when debugging my stupid little website in Python. Rather than paying another $2,000 for a Harvard course that didn’t even teach Python, I looked to the local community.</p>
<p>So in January of 2012, I started a women-only, Python-centric <a href="http://www.roguelynn.com/words/o-hai-there/">study group</a> through the <a href="https://www.meetup.com/women-who-code-sf">Women Who Code</a> (WWC) Meetup - graciously hosted by Dropbox. I did it through WWC because having attended previous events, it was a super supportive environment and I wanted to plug into that.</p>
<p>I created my own sort of “course”; it was an 8-week series that I made a curriculum for, where each week would be a mini project for a few hours on Wednesday evenings. I would digest the project, teach myself the week before, then “present” and work through it during the study group.</p>
<p><em>Side note</em>: My boyfriend and I had recently moved down to the Bay Area for his job. It was wicked difficult to find employment in banking at that time (and I’m sure it still is), so suffice it to say I was unemployed with plenty of time on my hands to learn.</p>
<p>I will have you know that I completely fumbled through that whole series. The folks at Dropbox gave some solicited feedback that perhaps a more experienced Pythonista would have been better. But I found the study group to be very successful. Week over week, the group was at max attendance with about 40 women. Sure, they may have come for the free dinner provided by our hosts, who can blame them. But they stayed for some reason.</p>
<p>Feedback from attendees that I received was great. Rather than the environment setup with an experienced lecturer, being “here, I have the knowledge, and you are here to learn what I want you to learn.” It was more of a “let’s fumble through this together,” eliminating the ego factor along with the competition that can happen in classic classrooms.</p>
<p>Conveniently, <a href="https://us.pycon.org/2012/:">PyCon 2012</a> was scheduled towards the end of the study group series right in Santa Clara. I was able to get a few free tickets for the March conference along with sharing a ZipCar for a few study group’ers.</p>
<p>NB: PyCon in 2012 was my very first conference ever, and my experience there changed me for the better. More on that in a bit.</p>
<p>But at PyCon, folks that originally started PyLadies (originally started in Los Angeles in late 2011) approached me as they were aware of the study group I was doing. They suggested I start a San Francisco chapter of PyLadies.</p>
<p>Naturally, my first thought was “wtf why isn’t it in SF already?”, followed by “F*CK YEA LET’S DO IT”. So April of 2012 was the official launch of <a href="https://www.meetup.com/pyladiessf">PyLadies in San Francisco</a>. We are now at 1800+ members which is the largest chapter of PyLadies out of over 50. Since the the start of the SF PyLadies, I have regularly hosted events including workshops like before, as well as speaker events for inspiration, and general drop-by study groups for women interested in getting their feet wet with Python.</p>
<h2 id="how-i-fake-it">How I fake it</h2>
<p>For some reason, women keep coming back to these events hosted through PyLadies. So I must be doing something right. So how exactly am I “faking” it? Essentially, my process is:</p>
<h4 id="step-1-break-everything.">Step 1: Break everything.</h4>
<p>First step - break everything. Mind you, breaking shit isn’t intentional, but a natural habit of mine. I try and follow some tutorial online or instructions given to me, and lo & behold, I accidentally delete my project directory instead.</p>
<h4 id="step-2-ask-for-help.">Step 2: Ask for help.</h4>
<p>So then I ask for help, naturally. I have my boyfriend, who is a very experienced software engineer. But I also had Twitter, which I preferred.</p>
<p>As you can imagine, the usual responses I got - though - was “Read the fucking manual,” or “let me Google that for you.” Great, thanks for the helpful advice!</p>
<h4 id="step-3-break-more-shit.">Step 3: Break more shit.</h4>
<p>So then I would struggle some more, break everything again but differently. Trying to undo what I broke before, redo it correctly, and breaking something further down the line.</p>
<h4 id="step-4-success">Step 4: Success!</h4>
<p>But then something would work! My website could run locally! And I wouldn’t be able to explain why! But I’ll take it!</p>
<h4 id="step-5-break-it-again.">Step 5: Break it again.</h4>
<p>And as soon as the boyfriend would come home, I’d be super excited to show him my success. “ZOMG DOOD LOOK AT WHAT I DID!!11!one!”</p>
<p>And as you can imagine: it would not work. I broke it again. (Learn version control the hard way, huh?)</p>
<h4 id="step-6-cry.">Step 6: Cry.</h4>
<p>As you can imagine, this whole process is aggravating and energy consuming. So frustrating to think you’re following instructions exactly, to only notice you were in the wrong directory, had a typo, or needed to install some header files when it wasn’t in the instructions.</p>
<p>So I’d cry! I’d cry often! And I’ll be frank - I’m not embarrassed to admit I cry.</p>
<p>One of the reasons we cry - beyond tearing up from having a cold, being out in freezing temperatures, or cutting an onion - is from an emotional response. From this built up stress, my eventual, viseral reaction is to cry.</p>
<p>A little <a href="http://www.youtube.com/watch?v=keMF8YzQoRM">science fact</a> actually: Tears that come from emotions have <a href="http://www.netdoctor.co.uk/healthy-living/wellbeing/the-health-benefits-of-crying.htm">higher levels of stress hormones</a> versus tears from cutting an onion, allergies, etc. They contain <a href="http://en.wikipedia.org/wiki/Adrenocorticotropic_hormone">ACTH</a> - a hormone that regulates our adrenaline, as well as <a href="http://en.wikipedia.org/wiki/Enkephalin">Enkephalin</a> - one of the body’s natural pain relievers. And crying is actually one of the body’s quickest ways to release that pent-up stress.</p>
<p>Therefore, I do suggest frustrated new coders to allow themselves to cry. Of course, folks may also mitigate stress via exercise, meditation, perhaps a glass of wine or four. But don’t be embarrassed if you want to cry - it’s a biological mechanism we have at our disposal.</p>
<p>Likewise, in my experience, after a good cry, and maybe some time away from my computer, I am of course feeling physically better. And then a big break through happens! Now I don’t know if it’s scientifically provable, but I always seem to have a break through post-cry. Whether I epically redeem myself at work after looking like an idiot, or finally deploying my dumbass little app, I have a win. So I can trust that my need to cry is just the lowest point of the current break-learn-break cycle I’m going through.</p>
<h4 id="lather.-rinse.-repeat.">Lather. Rinse. Repeat.</h4>
<p>I pretty much went through this cycle of breaking stuff, some successes, got some help, or combed through documentation, with various-sized successes along the way.</p>
<p>To help illustrate - I made this super awesome graph of my process in faking it:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/im-faking-it/fake-it-graph-1.jpg" width="400" title="Super awesome sinusoidal graph of my process" alt="Super awesome sinusoidal graph of my process"/></p>
<p>I start off mediocre and fumbling, I break stuff, then have minor success. I break more stuff and maybe have a bigger fall being so frustrated, then comes a bigger win. That’s how it feels to me.</p>
<p>Now how it actually looks like to everyone else is more like this:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/im-faking-it/fake-it-graph-2.jpg" width="400" title="Super awesome sinusoidal graph of my process" alt="Super awesome sinusoidal graph of my process"/></p>
<p>Where I make progress by breaking and learning, plateau from frustration, and continue on upwards. I may relearn something from the past week or month, but I’m definitely more knowledgable than 3 months ago, 6 months, or a year.</p>
<p>A saying that I find illustrates this comparison very well is to not compare your own “Behind the Scenes” takes to someone else’s “highlight reel.” People will only see your “highlights reel.” It may seem like I am struggling on a constant level, but what folks see is upward progression.</p>
<p>Similarly, I saw this image over two years ago and it has been my internal mantra ever since:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/im-faking-it/sucking.jpg" width="400" title="Super awesome sinusoidal graph of my process" alt="Super awesome sinusoidal graph of my process"/></p>
<h2 id="how-ive-made-it">How I’ve “made” it</h2>
<p>I explained how I am not a “real” coder, and how I’ve been faking it. But how exactly have I made it?</p>
<p>Earlier I mentioned going to my first conference - PyCon 2012. That inspired me so much that I started proposing talks. It just looked like so much fun!</p>
<p>My process in proposing talks was pretty much throwing spaghetti at a wall to see what sticks.
Of the Python/open-source conferences I first applied to in mid-2012, <em>all</em> of my talks were accepted. Diversity was just becoming very important within the Python community, so I was essentially speaking about how my learning how to code using the community - by doing study groups and workshops - and how that has been having an impact to effect diversity.</p>
<h4 id="a-new-chapeau">A new chapeau</h4>
<p>Coincidently, my <a href="https://ep2013.europython.eu/conference/talks/teaching-from-failure-creating-a-safe-space-for-learning">second talk ever</a> caught the eye of a Red Hat recruiter, giving me my first official software engineer position.</p>
<p>While working at Red Hat, I:</p>
<ul>
<li>Broke things;</li>
<li>Wrote up HOWTOs how not to break things;</li>
<li>Moved on to break the next thing.</li>
</ul>
<h4 id="tidal-shift">Tidal Shift</h4>
<p>And so as I was continuing my break-learn-break-cry-break cycle, the tide started to shift. Rather than submitting talks, I would get invited to speak. I started getting invited to host PyLadies workshops abroad, helping new chapters start up. I was voted by the Python community to the board of the Python Software Foundation. And then a member of the Django Software Foundation. I started regularly being solicited from recruiters; first from brand new start up companies that failed faster than the recruiter could hit send, and then eventually more established ones.</p>
<p>And so after a year at Red Hat, I left to <a href="http://www.roguelynn.com/words/joining-spotify/">join Spotify</a> for a better opportunity.</p>
<h2 id="im-not-the-only-faker">I’m not the only faker</h2>
<p>As I looked at my progress in the context of my faking it and seemingly making it, I could tell that others had to be sharing my secret. I’m not the only faker. Everyone else is a fraud, too.</p>
<h4 id="hour-rule">10,000-hour rule</h4>
<p>You may or may not be familiar with the <a href="http://en.wikipedia.org/wiki/Outliers_(book)">10,000 hour rule</a>. It’s defined by Malcolm Gladwell’s <a href="http://www.amazon.com/Outliers-Story-Success-Malcolm-Gladwell/dp/0316017930">Outliers</a> book, where the key to success in any field is a matter of practicing a specific task for a total of around 10,000 hours.</p>
<p>Now, Americans average about a 2,000-hrs a year at work having the standard 40-hour work week. So that 10,000 hour rule would at minimum be accomplished in 5 years. But I’m sure not 100% of someone’s day is focused on a particular skill, like coding. Everyone has meetings, emails, water cooler chats, etc. So perhaps it’s more like 7-10 years to achieve that.</p>
<p>Going by that fact alone, my peers who are “classically trained”, i.e. have a degree in Computer Science (which can truly only be at most 2 years of experience in coding), they can not be experts by that 10,000-hour rule.</p>
<p>More so, let’s add in the fact that this industry is always changes, with the new “in” technologies and frameworks coming and going as fast as we refresh our browser page - how can we become an “expert” when being an expert itself is a moving target!</p>
<h4 id="fraudulent-interviews">Fraudulent interviews</h4>
<p>Another way that I know I’m not the only fraud: job applications & interviews.</p>
<p>One of the effects of impostor syndrome is not applying to jobs, promotions, or other opportunities. Sound familiar?</p>
<p>When I was at the early stages of learning, I wanted to get interview practice to see if I was on track with my progress. I applied for jobs I was <em>barely</em> qualified for. I felt less pressure since I wasn’t really trying to land a job.</p>
<p>But I’d actually get called for interviews! Apparently I was competent enough; perhaps these recruiters were scraping the bottom of the barrel, but at least my fake-coder experience surpassed whatever bar that my impostor syndrome artificially set.</p>
<p>When I did a second round of interview practice about a year later, I certainly got much further, but the impostor feeling was still there. Though yet again, because I’m faking it, and my “faking it” was acceptable for these companies, how could others not be as well?</p>
<h2 id="tldr-continue-faking.">tl;dr: Continue faking.</h2>
<p>Essentially, I’m saying that I don’t have a conventional education for my line of work, nor my experience level matches those of my peers. It’s what makes me feel like a fraud. However, the successes from my break-fix-break learning cycles builds up. It’s evidence that I am - for some crazy reason - “making” it.</p>
<p>But this community can make us feel that we should have read the 2000-page “How to be a software engineer” manual over the weekend - and be ready to implement TLS in like, Assembly code. That’s crap. Ain’t nobody got time for that. The sheer progress I’ve made - <a href="http://www.roguelynn.com/words/my-path-into-engineering/">becoming an engineer in 1 year</a> - is evidence to me that I’m not alone.</p>
<p>So I’m not writing this to pass on some huge, 24k golden nugget of advice on how to rid yourself of feeling fake. No. Shit, I haven’t figured that out. But it’s along the same lines of “becoming comfortable with making yourself uncomfortable”: become comfortable with being a fraud. Embrace your fraudulent self.</p>
<p>My story here by far pales in comparison to Ms. Lovelace. But it’s my story that barrels through impostor syndrome not to exactly get rid of it, but to celebrate it. It is possible to fake it until you make it.</p>
My Path into Engineeringhttp://www.roguelynn.com/words/my-path-into-engineering/2014-09-15T09:46:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>More and more often, folks have been asking about how I became an engineer, what my “story” is. So, in the effort to save time and breath, here it is.</p>
<p><strong>TL;DR</strong>: I have a business degree, and started teaching myself how to code in the fall of 2011.</p>
<h4 id="started-in-business">Started in Business</h4>
<p>In high school, I really didn’t know what I wanted to study or do with my life. I loved playing the double bass and flute in the school’s orchestra. I even took <a href="http://www.ibo.org/diploma/curriculum/group6/music.cfm">IB music</a> and seriously considered going to <a href="http://www.berklee.edu/">Berklee College of Music</a> in Boston. But I had the foresight to know that I wouldn’t make a decent living as a musician.</p>
<p>I <em>loved</em> math and excelled in our <a href="http://www.ibo.org/diploma/curriculum/group6/music.cfm">highest available math class</a>, but as a teenager, my brain had trouble trying to turn math into a career beyond being a HS teacher. So I thought studying business would be a good medium - related to math, but also gives much more opportunity to getting a decent job after college.</p>
<p>So in 2008, I graduated <a href="http://www.babson.edu">Babson College</a> with a undergraduate degree in business management, concentrated in Economics and Finance. And yes, I certainly did graduate into a horrible economy, but oddly enough I landed a job in Boston at a <a href="https://www.bostonprivatebank.com/">bank</a> as a Treasury Analyst managing the bank’s balance sheet (assets, liabilities, & equity).</p>
<p>It was a pretty awesome job, but I wanted to (or at least thought I did) get a PhD in Economics, as I eventually wanted to work for the Federal Reserve (managing the <em>US</em>’s balance sheet!). I got accepted to a <a href="http://gsefm.eu/">PhD program</a> in Frankfurt, Germany, but I couldn’t go because the US Department of Education did not recognize it as a legit university, and therefore I could not defer my student loans from undergrad.</p>
<p>So in 2010, I decided to return home to Seattle to hit up the free rent to save up money. There, I met my current <a href="https://plus.google.com/+SebastianPorst">beau</a>, and in the summer of 2011 he decided to accept a position at Google down in Mountain View (I sort of nudged him as I wanted to get out of Seattle), eventually moving us down to SF.</p>
<p>Being in the Bay Area, I found out about UC Berkeley’s <a href="http://www.haas.berkeley.edu/MFE/">Masters in Financial Engineering</a> program, which I thought would be a great detour to get into the Federal Reserve, as I figured I could turn into a “<a href="http://en.wikipedia.org/wiki/Quantitative_analyst">quant</a>” and work my way into the Fed through the workforce route, rather than the PhD to post doc to lecturer/professor route.</p>
<h4 id="dipping-my-toes-into-programming">Dipping my Toes into Programming</h4>
<p>Welp, that MFE program required applicants to prove knowledge of programming in C/C++, so for the fall of 2011, I took a <a href="http://www.extension.harvard.edu/courses/intensive-introduction-computer-science">course</a> through <a href="http://www.extension.harvard.edu/">Harvard’s Extension School</a>. Because how hard could it be? I was only a few years out of college.</p>
<p>Well, I sucked. I failed both midterm exams. I cried often. I could not understand pointers and dereferencing. Stacks, queues, the various sorting algorithms. All over my head.</p>
<p>The course held a hack day for students at the Harvard campus, but since I was in SF, I decided to find a local one instead - <a href="http://sciencehackday.org/">Science Hack Day</a>.</p>
<p>It was at that hack day that I can pinpoint the <em>exact</em> moment that sealed my future, that I decided to continue learning how to code because I liked it, not because I needed to learn it. I was working with a team that hacked on parsing data from the <a href="http://home.web.cern.ch/topics/large-hadron-collider">LHC</a> to make some visualizations. There, I was introduced to Python. Looking over someone’s shoulder coding in Python, I literally thought,</p>
<blockquote>
<p>ZOMG I CAN ACTUALLY READ THIS. I CAN UNDERSTAND WHAT’S HAPPENING. WTF IS THIS MAGIC.</p>
</blockquote>
<p>Returning to my nearly-failed Harvard course, I decided to do my final project with this language that I was just exposed to. I made a <a href="http://inflatr.com">website</a> that calculated a user’s personalized inflation rate over 10 years. I built the site using <a href="https://www.djangoproject.com/">Django</a> because of their awesome <a href="https://docs.djangoproject.com/en/1.7/intro/tutorial01/">tutorial</a>.</p>
<p>Oddly enough, despite failing both midterms, I received an A- in the course because of my final project (weeee!!!). And I thought 3am never looked better while trying to debug my stupid little website. I wanted to continue learning to code, but did not want to pay another $2,000 for a course through Harvard.</p>
<h4 id="drinking-the-python-koolaid">Drinking the Python Koolaid</h4>
<p>I found out about <a href="http://www.meetup.com/women-who-code-sf">Women Who Code</a> through Meetup, and organized a study group to learn Python in the spring of 2012. I was a complete <em>n00b</em> but women showed up because we were all learning together.</p>
<p>I also heard about <a href="https://us.pycon.org">PyCon</a> that happened to be down in Santa Clara, and scored a few free tickets for me and a few WWC'ers to attend. Attending PyCon left me super inspired. PyLadies of Los Angeles encouraged me to start a PyLadies in San Francisco (which I <a href="http://www.roguelynn.com/words/debut-of-pyladiessf/">did</a> the following month!). I also knew I wanted to speak at conferences like PyCon - everyone was so awesome, the crowd, the speakers, the organizers.</p>
<p>Despite only learning how to code for less than a year, I through care to the wind and proposed talks to <a href="http://2012.djangocon.eu/">DjangoCon EU</a>, <a href="https://ep2013.europython.eu/ep2012">EuroPython</a>, and <a href="http://www.oscon.com/oscon2012">OSCON</a>, all of which were accepted. I spoke about how I was helping to improve diversity of the Python community by somehow convincing women in learning how to code with me.</p>
<h4 id="one-year-to-engineer">One Year to Engineer</h4>
<p>At EuroPython that year, I met a few folks at Red Hat, which is how I landed my <a href="http://www.roguelynn.com/words/from-n00b-to-engineer-in-one-year/">first job</a> as an engineer in the fall of 2012, and contributed my first patch to my team’s OSS <a href="http://freeipa.org">product</a> within <a href="http://www.roguelynn.com/words/from-email-setup-to-patch-submission-in-8-days/">two weeks</a>. For the year I was there, I integrated the product into <a href="http://www.freeipa.org/page/HowTos#3rd_party_Applications_Integration">many third party applications</a>, and wrote up one of my most popular blog posts ever: an <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">explanation of Kerberos</a>.</p>
<p>After a year at Red Hat, I decided to move on as I felt I wasn’t getting enough support as a new coder. I left for a <a href="http://www.roguelynn.com/words/joining-spotify/">job at Spotify</a> in September 2013, originally working as a Partner Engineer (3rd party integrations with our APIs). Shortly after joining, I moved into a Backend Engineer position because I felt I wasn’t coding enough (so many meetings as a Partner Engineer!).</p>
<p>Just recently, my first project that was originally built during an internal hack week, was publicly <a href="https://developer.spotify.com/news-stories/2014/09/09/web-api-console/">released</a>: an <a href="https://developer.spotify.com/web-api/console/">API Console</a> for external developers to get familiar with our Web APIs. When I was a Partner Engineer, I got fed up answering the same questions to external partners on how to use our API and OAuth, so I felt something like this would help! I am also working on many internal diversity initiatives, including integrating the <a href="http://adainitiative.org/">Ada Initiative</a>’s <a href="http://adainitiative.org/what-we-do/workshops-and-training/">ally’s workshop</a> into our introductory days for new employees.</p>
<p>And that’s where I am today. Since starting to learn to code, I’ve join the Board of Directors on the <a href="https://www.python.org/psf">Python Software Foundation</a>, I’ve done some <a href="http://www.roguelynn.com/projects/">side projects</a>, including <a href="http://newcoder.io">New Coder</a>, as well as lead PyLadies in taking over the <a href="http://www.pyladies.com/locations">world</a>, including the <a href="http://www.pyladies.com/blog/pip-install-pyladies/">pyladies</a> package on <a href="https://pypi.python.org/pypi/pyladies/">PyPI</a>. I still continue to speak <a href="http://www.roguelynn.com/talks/">a lot</a>, as well as host workshops for those learning how to code.</p>
<p>Mind you, my journey in learning how to code isn’t over. I’m still very much a n00b, not not as n00b as I once was. :)</p>
Explain like I’m 5: DNShttp://www.roguelynn.com/words/Explain-like-Im-5-DNS/2014-07-19T09:46:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This post is an accompaniment to my <a href="https://speakerdeck.com/roguelynn/for-lack-of-a-better-name-server-dns-explained">PyCon 2014</a> and <a href="https://ep2014.europython.eu/en/">EuroPython 2014</a> talk, For Lack of a Better Name(server): DNS Explained, that is a deep dive into DNS. Slides can be found <a href="https://speakerdeck.com/roguelynn/europython-2014-for-lack-of-a-better-name-server-dns-explained">here</a></p>
<p>I previously wrote a post <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">explaining Kerberos “Like I’m 5”</a> that turns out to be one of my most visited pieces, so I figured an ELI5 version of my DNS talk would be beneficial to some.</p>
<p><strong>DISCLAIMER:</strong> Not literally for 5-year-olds! As noted with the <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">kerberos writeup</a>, this post is <em>not</em> an attempt to explain to a child. It’s meant to bring the reader from an ephemeral understanding to more comfort when <strike>fucking up</strike> working with DNS.</p>
<hr>
<h3 id="wtf-where39s-my-website">WTF where’s my website?</h3>
<p>As a nerdy person who has many side projects, I’ve had many experiences setting up personal projects for deployment. As I’m sure you have all been through, nearly every time when one does the first <code>git push</code> to Heroku, it doesn’t work.</p>
<p><a href="http://devopsreactions.tumblr.com/post/39647674903/realizing-its-yet-another-dns-problem"><img class="displayed" title="It's always DNS" alt="It's always DNS" src="http://www.roguelynn.com/assets/images/eli5-dns/devops_dns_issue.gif" /></a></p>
<p>All else equal - e.g. Heroku is not down - I’m betting DNS is the issue. Who actually has set up DNS cleanly the first time? You follow the directions on your host’s website to properly setup DNS records, but something still doesn’t work. We’ve all been there. And without a solid understanding of DNS, often times folks just fall into a “oh, let’s try this”, guess-editing records, waiting for DNS to propagate to test if the guess was correct - the <code>print</code> statements of Python debugging (I’m also guilty of this).</p>
<p>Naturally, curiosity got the best of me. It’s common knowledge that DNS is the internet’s phonebook. Sure - it’s the backbone of the internet; it’s a safe assumption that the cloud itself is build on DNS and duct tape, but that’s about all I knew.</p>
<p><img class="displayed" title="The Cloud: DNS and Duct tape" alt="The Cloud: DNS and Duct tape" src="http://www.roguelynn.com/assets/images/eli5-dns/tweet_dns_duct_tape.png"></p>
<h3 id="why-dns">Why DNS?</h3>
<p>So what exactly is the purpose of DNS?</p>
<p>DNS is necessary for you to:</p>
<ul>
<li>visit productive websites like reddit.com</li>
<li>receive critical emails from Groupon and Gilt</li>
<li>deploy your one-of-a-kind TODO list application</li>
<li>allow for your corporate meme generator to not be accessible by non-employees</li>
</ul>
<p>Truthful joking aside, DNS stands for Domain Name System, and is widely referred to being a phone book, translating human-readable names to computer-friendly addresses.</p>
<p>The formal description of DNS is:</p>
<blockquote>
<p>… a distributed storage system for Resource Records (RR). Each DNS resolver or authoritative server stores [these records] it its cache or local zone file. A … record includes a label, class, type, and data.</p>
</blockquote>
<p><small>– <a href="https://www.cs.utexas.edu/%7Eshmat/shmat_securecomm10.pdf">Sooel Son and Vitaly Shmatikov, University of Texas at Austin</a> (PDF)</small></p>
<p>With the textbook definition out of the way, let’s see it in action! I always understood something better when I’ve gotten my hands a bit dirty.</p>
<p>Naturally, to play around, I used my latest Python crush, <a href="http://www.secdev.org/projects/scapy/">Scapy</a>. Here, I am using Scapy to sniff my own DNS traffic as I am browsing the interwebs:</p>
<pre><code data-lang="python">>>> from scapy.all import * # cringe
>>>
>>> a=sniff(filter="udp and port 53", count=10)
>>> a
<Sniffed: TCP:0 UDP:10 ICMP:0 Other:0>
>>>
>>> a.show()
0000 Ether / IP / UDP / DNS Qry "www.google.com."
0001 Ether / IP / UDP / DNS Qry "reddit.com."
0002 Ether / IP / UDP / DNS Ans "74.125.239.144"
0003 Ether / IP / UDP / DNS Ans "96.17.109.11"
0004 Ether / IP / UDP / DNS Qry "roguelynn-spy.herokuapp.com."
0005 Ether / IP / UDP / DNS Ans "us-east-1-a.route.herokuapp.com."
0006 Ether / IP / UDP / DNS Qry "roguelynn.com."
0007 Ether / IP / UDP / DNS Ans "81.28.232.189"
0008 Ether / IP / UDP / DNS Qry "www.roguelynn.com."
0009 Ether / IP / UDP / DNS Ans "roguelynn.com."
</code></pre>
<p>I am using Scapy’s <a href="http://www.secdev.org/projects/scapy/doc/usage.html#sniffing">sniff</a> function to pick up my local traffic, filtering by the <a href="http://en.wikipedia.org/wiki/User_Datagram_Protocol">UDP</a> protocol on port 53 (the protocol and typical port for DNS traffic), and limiting to capturing only 10 packets (or, since it’s UDP, <a href="https://twitter.com/glyph/status/414988975036571648">datagrams</a>).</p>
<p>So as I let this sniff function run, I went to my browser to type in <code>roguelynn.com</code>.</p>
<p>What was pretty cool as I was typing this into Chrome’s address bar, you can see a DNS query would take place for every autocomplete guess that Chrome took. It first pings <code>www.google.com</code> because the address bar is also Google search. Then, as I typed <code>r</code>, it autocompletes to <code>reddit.com</code> (one of my most visited sites, and therefore very natural to be guessed), and we can see the DNS query on the second line. Then as I typed <code>ro</code>, Chrome guesses <code>roguelynn-spy.herokuapp.com</code> (which is my awesome How to Spy with Python presentation, and coincidently, I am giving that talk at PyData Berlin 2014), and we can see its related query. Then it finds <code>roguelynn.com</code> once i typed <code>dog</code> and pressed enter with Chrome’s autocompletion. These autocompleted DNS queries seem more of a thing that Chrome does (and perhaps other browsers) to speed up navigation to frequented sites.</p>
<p>But notice one thing here: all of these DNS querys have a dot at the end, e.g.: <code>0009 Ether / IP / UDP / DNS Ans "roguelynn.com."</code>. Perhaps many of you know that’s “how DNS does things”, but why is it really there?</p>
<h3 id="example.com-vs-example.com.">example.com vs example.com.</h3>
<p>The difference between the trailing dot and the absence of such is the same difference between absolute file paths and relative file paths, e.g. <code>../static</code> versus <code>/Users/lynnroot/Dev/site/static</code>.</p>
<p>Like relative filenames and directories, it can be mangled or mapped incorrectly. Depending on how your local DNS is setup, in your <code>resolv.conf</code> file, if there’s a line of <code>search example.net</code> and you navigated to <code>example.com</code>, the DNS search query would take the URL to not be fully qualified, and therefore would look up <code>example.com.example.net</code>. If you navigated to <code>example.com.</code>, DNS would not apply the search path defined in <code>resolv.conf</code>.</p>
<p>Basically, if there is a dot at the end, it is the unambiguous, fully qualified domain name (FQDN), and not prone to search path spoofing. When playing with Scapy’s sniff function above, I didn’t put a trailing dot while navigating to <code>roguelynn.com</code> in my browser. Chrome’s implementation just assumes the dot, as it’s not really user friendly.</p>
<h3 id="where-are-my-queries-going">Where are my queries going?</h3>
<p>Continuing my curiosity, what is the route that my DNS query takes to finally get an answer for where <code>roguelynn.com</code> is hosted?</p>
<p>This is actually not that easy to figure out; once the DNS query hits my wifi router, it’s a bit of a black box where that query is forward to if it’s not locally cached. I know that my computer’s DNS is set up to <code>192.168.1.1</code>, which is my router, and my router’s DNS is set up to both <code>75.75.75.75</code> and <code>75.75.76.76</code> (found this out by logging into my router’s admin page).</p>
<p>If I do a <code>host</code> query on my router’s DNS, I get the pointer to a <code>comcast.net</code> subdomain:</p>
<pre><code data-lang="bash">host 75.75.75.75
75.75.75.75.in-addr.arpa domain name pointer cdns01.comcast.net.
</code></pre>
<p>Now if I do a <code>whois</code> on the IP, I can see that Comcast, my ISP provider, owns these IP addresses:</p>
<pre><code data-lang="bash">$ whois 75.75.75.75
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# http://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# Query terms are ambiguous. The query is assumed to be:
# "n 75.75.75.75"
#
# Use "?" to get help.
#
#
# The following results may also be obtained via:
# http://whois.arin.net/rest/nets;q=75.75.75.75?showDetails=true&showARIN=false&ext=netref2
#
Comcast Cable Communications Holdings, Inc CCCH-3-34 (NET-75-64-0-0-1) 75.64.0.0 - 75.75.191.255
Comcast Cable Communications Holdings, Inc COMCAST-47 (NET-75-75-72-0-1) 75.75.72.0 - 75.75.79.255
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# http://www.arin.net/public/whoisinaccuracy/index.xhtml
#
</code></pre>
<p>Beyond that, I do not know if Comcast’s DNS has <code>roguelynn.com</code> cached, and if not, where the query got directed to after that.</p>
<p>But DNS is hierarchical, and getting familiar with the <code>dig</code> command can help us understand at least how queries are resolved.</p>
<p>The <code>dig</code> command has a <code>+trace</code> flag that makes “iterative queries to resolve the name being looked up. It will follow the root servers, showing the answer from each server that was used to resolve the lookup.”<sup><a href="http://linux.die.net/man/1/dig">1</a></sup> Let’s try this out with <code>python.org</code>:</p>
<pre><code data-lang="bash">$ dig +trace python.org
; <<>> DiG 9.8.3-P1 <<>> +trace python.org
;; global options: +cmd
. 12668 IN NS a.root-servers.net.
. 12668 IN NS b.root-servers.net.
. 12668 IN NS c.root-servers.net.
. 12668 IN NS d.root-servers.net.
. 12668 IN NS e.root-servers.net.
. 12668 IN NS f.root-servers.net.
. 12668 IN NS g.root-servers.net.
. 12668 IN NS h.root-servers.net.
. 12668 IN NS i.root-servers.net.
. 12668 IN NS j.root-servers.net.
. 12668 IN NS k.root-servers.net.
. 12668 IN NS l.root-servers.net.
. 12668 IN NS m.root-servers.net.
;; Received 496 bytes from 192.168.1.1#53(192.168.1.1) in 221 ms
org. 172800 IN NS a0.org.afilias-nst.info.
org. 172800 IN NS a2.org.afilias-nst.info.
org. 172800 IN NS b0.org.afilias-nst.org.
org. 172800 IN NS b2.org.afilias-nst.org.
org. 172800 IN NS c0.org.afilias-nst.info.
org. 172800 IN NS d0.org.afilias-nst.org.
;; Received 430 bytes from 202.12.27.33#53(202.12.27.33) in 469 ms
python.org. 86400 IN NS ns1.p11.dynect.net.
python.org. 86400 IN NS ns3.p11.dynect.net.
python.org. 86400 IN NS ns2.p11.dynect.net.
python.org. 86400 IN NS ns4.p11.dynect.net.
;; Received 114 bytes from 199.19.53.1#53(199.19.53.1) in 141 ms
python.org. 43200 IN A 140.211.10.69
python.org. 86400 IN NS ns4.p11.dynect.net.
python.org. 86400 IN NS ns2.p11.dynect.net.
python.org. 86400 IN NS ns3.p11.dynect.net.
python.org. 86400 IN NS ns1.p11.dynect.net.
;; Received 130 bytes from 208.78.71.11#53(208.78.71.11) in 13 ms
</code></pre>
<p>For the more visually inclined learner, let’s look at this query pictorially:</p>
<p>The <code>dig</code> query starts at my local DNS, <code>192.168.1.1</code>, where, if not cached, is based on to the root server:</p>
<p><img class="displayed" title="python.org DNS Query: local dns" alt="python.org DNS Query: local dns" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.002.jpg"/></p>
<p>The query from my local DNS for <code>python.org</code> first asks for the root name server (the <code>.</code>) who knows that one of these hosts should have the information, and so the name server responds with “try one of these hosts”, which cooresponds to the <code>.org</code> name server:</p>
<p><img class="displayed" title="python.org DNS Query: root dns" alt="python.org DNS Query: root dns" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.003.jpg"/></p>
<p>The <code>.org</code> name server receives the query, then says something like “try one of these hosts” which corresponds to the <code>python.org</code> name server:</p>
<p><img class="displayed" title="python.org DNS Query: org dns" alt="python.org DNS Query: org dns" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.004.jpg"/></p>
<p>The <code>python.org</code> name server says “yep, we have the A record for python.org, and it’s at address 140.211.10.69!”</p>
<p><img class="displayed" title="python.org DNS Query: python.org dns" alt="python.org DNS Query: python.org dns" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.005.jpg"/></p>
<p>But if we wanted to know more about, say, <code>hg.python.org</code>, or others - doing a <code>dig hg.python.org</code>, we actually get that it is a CNAME record mapped to <code>virt-7yvsjn.psf.osuosl.org</code>:</p>
<pre><code data-lang="bash">dig +trace hg.python.org
; <<>> DiG 9.8.3-P1 <<>> +trace hg.python.org
;; global options: +cmd
. 12170 IN NS g.root-servers.net.
. 12170 IN NS h.root-servers.net.
. 12170 IN NS a.root-servers.net.
. 12170 IN NS b.root-servers.net.
. 12170 IN NS k.root-servers.net.
. 12170 IN NS i.root-servers.net.
. 12170 IN NS e.root-servers.net.
. 12170 IN NS f.root-servers.net.
. 12170 IN NS j.root-servers.net.
. 12170 IN NS c.root-servers.net.
. 12170 IN NS d.root-servers.net.
. 12170 IN NS m.root-servers.net.
. 12170 IN NS l.root-servers.net.
;; Received 228 bytes from 8.8.4.4#53(8.8.4.4) in 145 ms
org. 172800 IN NS d0.org.afilias-nst.org.
org. 172800 IN NS a2.org.afilias-nst.info.
org. 172800 IN NS a0.org.afilias-nst.info.
org. 172800 IN NS c0.org.afilias-nst.info.
org. 172800 IN NS b0.org.afilias-nst.org.
org. 172800 IN NS b2.org.afilias-nst.org.
;; Received 433 bytes from 192.33.4.12#53(192.33.4.12) in 208 ms
python.org. 86400 IN NS ns1.p11.dynect.net.
python.org. 86400 IN NS ns2.p11.dynect.net.
python.org. 86400 IN NS ns3.p11.dynect.net.
python.org. 86400 IN NS ns4.p11.dynect.net.
;; Received 117 bytes from 199.249.112.1#53(199.249.112.1) in 173 ms
hg.python.org. 86400 IN CNAME virt-7yvsjn.psf.osuosl.org.
;; Received 68 bytes from 208.78.71.11#53(208.78.71.11) in 213 ms
</code></pre>
<p><img class="displayed" title="python.org DNS Query: hg.python.org dns" alt="python.org DNS Query: hg.python.org dns" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.006.jpg"/></p>
<h3 id="other-resource-records">Other resource records</h3>
<p>Now there are certainly more records attached to <code>python.org</code> besides a CNAME pointing to <code>hg.python.org,</code> or <code>blog.python.org</code>. We can actually run the dig command against <code>python.org</code> with a few flags, particularly <code>-t ANY</code>:</p>
<pre><code data-lang="bash">dig +nocmd +noqr +nostats python.org -t ANY
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25949
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;python.org. IN ANY
;; ANSWER SECTION:
python.org. 21599 IN SOA ns1.p11.dynect.net. infrastructure-staff.python.org. 2014052200 3600 600 604800 3600
python.org. 21599 IN NS ns1.p11.dynect.net.
python.org. 21599 IN NS ns2.p11.dynect.net.
python.org. 21599 IN NS ns3.p11.dynect.net.
python.org. 21599 IN NS ns4.p11.dynect.net.
python.org. 21599 IN A 140.211.10.69
python.org. 21599 IN MX 50 mail.python.org.
python.org. 21599 IN TXT "v=spf1 mx a:psf.upfronthosting.co.za a:mail.wooz.org ip4:82.94.164.166/32 ip6:2001:888:2000:d::a6 ~all"
</code></pre>
<p>Unfortunately, not much came back beyond <code>A</code>, <code>NS</code>, and an <code>MX</code> record. If we look at <code>pyladies.com</code> it is a little bit more interesting, with SOA records pointing to name.com, MX records pointing to Google, and our A record pointing to our web host:</p>
<pre><code data-lang="bash">dig +nocmd +noqr +nostats pyladies.com -t ANY
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50779
;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;pyladies.com. IN ANY
;; ANSWER SECTION:
pyladies.com. 299 IN SOA ns1qsy.name.com. support.name.com. 1 10800 3600 604800 300
pyladies.com. 299 IN NS ns4kpx.name.com.
pyladies.com. 299 IN NS ns1qsy.name.com.
pyladies.com. 299 IN NS ns2fkr.name.com.
pyladies.com. 0 IN A 81.28.232.189
pyladies.com. 299 IN NS ns3jkl.name.com.
pyladies.com. 299 IN MX 10 ASPMX.L.GOOGLE.com.
</code></pre>
<p>What you won’t get when using <code>ANY</code> with dig is the full zone file or DNS setup, like all available CNAMEs. I’ll go a bit more into that in a bit, but for now, we can easily see the resolve path DNS takes to lookup python.org and pyladies.com. However, that is not the most efficient way that DNS can respond to queries</p>
<h3 id="caching">Caching</h3>
<p>Rather than inundating root and top-level name servers like <code>.</code> and <code>.org</code>, DNS can be set up to cache requests:</p>
<blockquote>
<p>When a DNS resolver or authoritative server receives a query, it searches its cache for a matching label. If there is no matching label in the cache, the server may instead retrieve from the cache and return a referral response, containing [a resource record set] of the NS type whose label is “closer” to the domain which is the subject of the query.</p>
<p>Instead of sending a referral response, the DNS resolver may also be configured to initiate the same query to an authoritative DNS server responsible for the domain name which is the subject of the query …</p>
</blockquote>
<p><small>– <a href="https://www.cs.utexas.edu/%7Eshmat/shmat_securecomm10.pdf">Sooel Son and Vitaly Shmatikov, University of Texas at Austin</a> (PDF)</small></p>
<p>The authoritative server can then respond with an answer, a referral, or a failed response. After,</p>
<blockquote>
<p>the authoritative server’s response … is accepted by the DNS resolver and stored in its cache only if the [resource record set] meets a set of certain conditions</p>
</blockquote>
<p>which is specific to each resolver implementation.<sup><a href="https://www.cs.utexas.edu/%7Eshmat/shmat_securecomm10.pdf">2</a></sup></p>
<p>So if my local DNS server did not hold a cached record for <code>python.org</code>, it <em>could</em> send the DNS query to a root DNS server, and get pointed to go to the name servers that handle the <code>.org</code> domain. But since I’ve been to many <code>.org</code> sites, my DNS most likely has those name servers cached, so it can skip the first query. And then it trickles down from there.</p>
<p>DNS caching sounds all great and hunky-dory until you get to propagation. Propagation is how long one has to wait for DNS changes to show effect, and is often the pain point many people feel when deploying The Awesome Unique TODO App™.</p>
<p>DNS will hold a record for as long as its TTL - Time to Live - number, at which point it deletes it. After it’s deleted, if someone makes a new request that refers to that record, the DNS server will go through that process again, querying an authoritative server.</p>
<p>When setting up DNS records, perhaps your DNS host is awesome (I like <a href="https://www.name.com">name.com</a> and <a href="http://www.fastmail.es/?STKI=10893350">fastmail.fm</a>) and allows you to adjust the TTL within a decent range (1 second to 24 hours - tbh I’m not sure if there’s an upper limit?). However, having too high of a number set for TTL and your local and ISP caches will last longer, and therefore your friend may not be able to see your Glorious TODO App™. Likewise, having too low of a TTL may overload the server with the frequency of queries. And while your your DNS host may be awesome allowing you to find the sweet spot for TTL, some ISPs may ignore it completely and set their own expiry for records.</p>
<p>In addition to caching and propagation being web devs' pain point with DNS, caching additionally opens up the ability to poison a DNS’s cache. This is <em>by far</em> not my area of expertise, but as I understand it, DNS cache poisoning works like so:</p>
<p>If a server doesn’t validate DNS responses (for example, via <a href="http://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions">DNSSEC</a>), someone could exploit that by essentially spoofing an IP address s/he owns for a given hostname, forcing visitors of that certain hostname to be directed elsewhere. To be able to spoof a DNS entry, an attacker would have to create a response faster than that of a legitimate authoritative server. Now, you can effectively DDoS a DNS caching server with probable non-cached entries, providing many attempts to send fake responses. The random domains that are now cached aren’t too useful then, but the attacker can also add to his/her response a name server for the desired domain to compromise.</p>
<p>Again, I am no expert in the subject of DNSSEC, so I encourage folks to read <a href="https://www.cs.utexas.edu/%7Eshmat/shmat_securecomm10.pdf">this paper</a> (PDF) to get a better understanding of the different ways to poison a DNS’s cache.</p>
<h2 id="nerdy-things-i-learned">Nerdy things I learned</h2><h3 id="interesting-ways-to-interact-with-dns">Interesting ways to interact with DNS</h3><h4 id="dnsmap">dnsmap</h4>
<p>Earlier, we did a few <code>dig</code> queries with the <code>-t ANY</code> flag, failing to see any CNAME records. You could certainly run <code>dig www.pyladies.com -t ANY</code>, but it is a bit prohibitive to dig every subdomain to find information about CNAME records, especially for a site that you do not manage. As well, being able to look up the full DNS zone file is rarely allowed.</p>
<p>Certainly, there’s a script for that! There’s this handy tool called <a href="https://code.google.com/p/dnsmap/">dnsmap</a> that literally brute-forces subdomain lookup:</p>
<pre><code data-lang="bash">$ dnsmap pyladies.com
dnsmap 0.30 - DNS Network Mapper by pagvac (gnucitizen.org)
[+] searching (sub)domains for pyladies.com using built-in wordlist
[+] using maximum random delay of 10 millisecond(s) between requests
dc.pyladies.com
IP address #1: 81.28.232.189
sf.pyladies.com
IP address #1: 81.28.232.189
tw.pyladies.com
IP address #1: 23.23.245.47
www.pyladies.com
IP address #1: 81.28.232.189
</code></pre>
<p>Trying <code>dnsmap pyladies.com</code> only returns about 4 results even though - as one of the managers of the site - I know there’s way over <a href="http://www.pyladies.com/locations">20</a>. So don’t exactly expect the results to be comprehensive, nor fast since it’s literally searching based on a built-in word list one at a time without multithreading. So this tool is limited to its built-in word list, which you can certainly supply on your own as well.</p>
<p>I ran dnsmap against spotify.net for funsies while running the earlier described sniff function from <a href="http://www.secdev.org/projects/scapy/">scapy</a>. Here is a captured UDP datagram in which you can see dnsmap was querying for <code>zr.spotify.net</code>:</p>
<pre><code data-lang="python">###[ Ethernet ]###
dst = 04:a1:51:90:af:d4
src = 14:10:9f:e1:54:9b
type = 0x800
###[ IP ]###
ttl = 255
proto = udp
chksum = 0x12ee
src = 192.168.1.7
dst = 192.168.1.1
###[ UDP ]###
sport = 54929
dport = domain
###[ DNS ]###
id = 11102
opcode = QUERY
rcode = ok
qdcount = 1
ancount = 0
nscount = 0
arcount = 0
\qd \
|###[ DNS Question Record ]###
| qname = 'zr.spotify.net.'
| qtype = A
| qclass = IN
</code></pre>
<p>You can easily see the Question Record - the name of the record, type, and class.</p>
<h4 id="local-cache">local cache</h4>
<p>When I was playing around with DNS, I wanted to figure out what’s in my local DNS’s cache. At least for OS X, you can see what is cached by literally killing the process (it automatically starts up again) which flushes the cache and writes to the sys log:</p>
<pre><code data-lang="bash">$ sudo killall -INFO mDNSResponder
$ tail -n 500 /var/log/system.log | grep mDNSResponder
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 9 12229 -U- CNAME 37 1-courier.push.apple.com. CNAME 1.courier-push-apple.com.akadns.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 27 * 3029 -U- - PTR 0 lb._dns-sd._udp.10.0.137.10.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 43 2869 lo0 + TXT 32 sudo\032make\032me\032a\032sammich._device-info._tcp.local. TXT model=MacBookPro10,1¦osxvers=13
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 43 2869 en0 + TXT 32 sudo\032make\032me\032a\032sammich._device-info._tcp.local. TXT model=MacBookPro10,1¦osxvers=13
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 49 70 -U- - PTR 0 13.16.16.172.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 54 106364 -U- - Addr 0 toezmncibr. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 54 106364 -U- SOA 64 . SOA a.root-servers.net. nstld.verisign-grs.com. 2014072100 1800 900 604800 86400
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 71 106364 -U- - Addr 0 lszyeahwbnztqh. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 71 106364 -U- SOA 64 . SOA a.root-servers.net. nstld.verisign-grs.com. 2014072100 1800 900 604800 86400
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 74 * 3029 -U- - PTR 0 lb._dns-sd._udp.0.0.16.172.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 92 * 3029 -U- - PTR 0 b._dns-sd._udp.0.0.16.172.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 95 70 -U- Addr 4 client-log.box.com. Addr 74.112.184.96
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 95 70 -U- Addr 4 client-log.box.com. Addr 74.112.185.96
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 105 29 -U- CNAME 34 evintl-ocsp.verisign.com. CNAME ocsp.ws.symantec.com.edgekey.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.130
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.133
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.134
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.142
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.129
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.131
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.132
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.137
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.136
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.135
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 121 200 -U- Addr 4 www3.l.google.com. Addr 173.194.41.128
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 171 1763 -U- CNAME 24 s3.amazonaws.com. CNAME s3.a-geo.amazonaws.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 208 865 -U- CNAME 34 evsecure-ocsp.verisign.com. CNAME ocsp.ws.symantec.com.edgekey.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 211 1586 -U- Addr 4 apple.com. Addr 17.142.160.59
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 211 1586 -U- Addr 4 apple.com. Addr 17.178.96.59
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 211 1586 -U- Addr 4 apple.com. Addr 17.172.224.47
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 217 24143 -U- CNAME 38 46-courier.push.apple.com. CNAME 46.courier-push-apple.com.akadns.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 223 107696 -U- - Addr 0 dnsmap. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 223 107696 -U- SOA 64 . SOA a.root-servers.net. nstld.verisign-grs.com. 2014072100 1800 900 604800 86400
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 233 107019 -U- - Addr 0 dnssec. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 233 107019 -U- SOA 64 . SOA a.root-servers.net. nstld.verisign-grs.com. 2014072100 1800 900 604800 86400
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 240 25044 -U- CNAME 37 p01-calendars.icloud.com. CNAME p01-calendars.icloud.com.akadns.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 248 870 -U- CNAME 34 sr.symcd.com. CNAME ocsp.ws.symantec.com.edgekey.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 248 * 3029 -U- - PTR 0 db._dns-sd._udp.10.0.137.10.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 267 25361 -U- CNAME 19 talk.google.com. CNAME talk.l.google.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 271 1271 -U- - Addr 0 ns.iana.org. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 271 1271 -U- SOA 58 iana.org. SOA sns.dns.icann.org. noc.dns.icann.org. 2014052499 7200 3600 1209600 3600
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 275 1372 -U- CNAME 24 api.facebook.com. CNAME star.c10r.facebook.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 279 1179 -U- Addr 4 www.evernote.com. Addr 204.154.94.81
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 298 1390 -U- TXT 32 time.apple.com. TXT ntp minpoll 9 maxpoll 12 iburst
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 311 22633 -U- CNAME 34 p15-caldav.icloud.com. CNAME p15-caldav.icloud.com.akadns.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 335 345 -U- CNAME 30 www.apple.com. CNAME www.isg-apple.com.akadns.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 351 * 3029 -U- - PTR 0 db._dns-sd._udp.0.0.16.172.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 359 106364 -U- - Addr 0 yklgvieqhip. Addr
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 359 106364 -U- SOA 64 . SOA a.root-servers.net. nstld.verisign-grs.com. 2014072100 1800 900 604800 86400
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 366 3068 -U- CNAME 24 wildcard.tripit.com.edgekey.net. CNAME e6320.b.akamaiedge.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 366 1763 -U- CNAME 20 s3.a-geo.amazonaws.com. CNAME s3-1.amazonaws.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 371 24296 -U- CNAME 25 ocsp.ws.symantec.com.edgekey.net. CNAME e8218.ce.akamaiedge.net.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 390 25691 -U- CNAME 35 ssl.google-analytics.com. CNAME ssl-google-analytics.l.google.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 446 * 3029 -U- - PTR 0 b._dns-sd._udp.10.0.137.10.in-addr.arpa. PTR
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 488 25361 -U- CNAME 19 calendar.google.com. CNAME www3.l.google.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 493 79 -U- CNAME 17 d.dropbox.com. CNAME d.v.dropbox.com.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: Cache currently contains 106 entities; 6 referenced by active questions
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: --------- Auth Records ---------
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: Int Next Expire State
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 8 0 0 ALL 32 sudo\032make\032me\032a\032sammich._device-info._tcp.local. TXT model=MacBookPro10,1¦osxvers=13
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 8 0 0 lo0 30 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.E.F.ip6.arpa. PTR sudo-make-me-a-sammich.local.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 8 0 0 en0 30 13.16.16.172.in-addr.arpa. PTR sudo-make-me-a-sammich.local.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 8 0 0 lo0 16 sudo-make-me-a-sammich.local. AAAA FE80:0000:0000:0000:0000:0000:0000:0001
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: 8 0 0 en0 4 sudo-make-me-a-sammich.local. Addr 172.16.16.13
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: --------- LocalOnly, P2P Auth Records ---------
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: State Interface
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: Verified LO 36 sudo\032make\032me\032a\032sammich._whats-my-name._tcp.local. SRV 0 0 0 sudo-make-me-a-sammich.local.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: Shared LO 7 b._dns-sd._udp.local. PTR local.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: Shared LO 7 r._dns-sd._udp.local. PTR local.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique LO 11 1.0.0.127.in-addr.arpa. PTR localhost.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique LO 11 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. PTR localhost.
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: --------- /etc/hosts ---------
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: State Interface
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique LO 4 localhost. Addr 127.0.0.1
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique LO 16 localhost. AAAA 0000:0000:0000:0000:0000:0000:0000:0001
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique 1 16 localhost. AAAA FE80:0000:0000:0000:0000:0000:0000:0001
Jul 21 08:34:42 sudo-make-me-a-sammich.local mDNSResponder[60]: KnownUnique LO 4 broadcasthost. Addr 255.255.255.255
# <--snipped-->
</code></pre>
<p>We can see some familiar records; I see Facebook, Evernote, Apple, Tripit, Amazon - names I would expect since I use apps that all connect to those services.</p>
<p>What is that I hear? Not enough Python? Well…</p>
<h3 id="twisted">Twisted</h3>
<p>Surely you knew this was coming - you can easily create your own DNS forwarder with <a href="https://twistedmatrix.com/trac/">Twisted</a>’s <a href="http://twistedmatrix.com/trac/wiki/TwistedNames">names</a>.</p>
<p>Below is a simple DNS server from Twisted’s documentation. We can run this, as well as fire up scapy, and run <code>dig</code> against the server:</p>
<pre><code data-lang="python">from twisted.internet import reactor
from twisted.names import client, dns, server
def main():
"""
Run the server.
"""
factory = server.DNSServerFactory(
clients=[client.Resolver(resolv=‘/etc/resolv.conf')]
)
protocol = dns.DNSDatagramProtocol(controller=factory)
reactor.listenUDP(10053, protocol)
reactor.listenTCP(10053, factory)
reactor.run()
if __name__ == '__main__':
raise SystemExit(main())
</code></pre>
<p>Looking at the datagram picked up by scapy, we can see that the query - the Question - has a query name, type, and class:</p>
<pre><code data-lang="python">###[ Ethernet ]###
dst = 04:a1:51:90:af:d4
src = 14:10:9f:e1:54:9b
type = 0x800
###[ IP ]###
ttl = 64
proto = udp
chksum = 0x4a0c
src = 192.168.1.7
dst = 192.168.1.1
\options \
###[ UDP ]###
sport = 33408
dport = domain
###[ DNS ]###
opcode = QUERY
rcode = ok
\qd \
|###[ DNS Question Record ]###
| qname = 'python.org.'
| qtype = A
| qclass = IN
</code></pre>
<p>And now the corresponding response with the DNS resource record and data associated with it, include type of record, TTL, data, and resource record name:</p>
<pre><code data-lang="python">###[ Ethernet ]###
dst = 14:10:9f:e1:54:9b
src = 04:a1:51:90:af:d4
type = 0x800
###[ IP ]###
ttl = 64
proto = udp
chksum = 0xb74c
src = 192.168.1.1
dst = 192.168.1.7
###[ UDP ]###
sport = domain
dport = 54438
###[ DNS ]###
qr = 1L
opcode = QUERY
\qd \
|###[ DNS Question Record ]###
| qname = 'python.org.'
| qtype = A
| qclass = IN
\an \
|###[ DNS Resource Record ]###
| rrname = 'python.org.'
| type = A
| rclass = IN
| ttl = 39777
| rdlen = 4
| rdata = '140.211.10.69'
</code></pre><h2 id="interesting-ways-to-use-dns">Interesting ways to use DNS</h2><h3 id="anycast">Anycast</h3>
<p>You folks may know the types of IP network addressing methodology, including <a href="http://en.wikipedia.org/wiki/Unicast">unicast</a>, <a href="http://en.wikipedia.org/wiki/Multicast">multicast</a>, and <a href="http://en.wikipedia.org/wiki/Broadcasting_(computing)">broadcast</a>, or at least somewhat familiar with those terms when <strike>screwing</strike> setting up networking for a local VM (<a href="http://www.vagrantup.com/">vagrant</a> is so uber helpful to avoid these networking issues, btw!).</p>
<p><a href="http://en.wikipedia.org/wiki/Anycast">Anycast</a> is a fourth one where datagrams are sent via a single sender to a group of potential receivers all identified by the same address, referred to as a one-to-nearest association. One of the keynotes at PuppetConf 2013 by Google’s Gordon Rowell <a href="http://puppetlabs.com/presentations/keynote-why-did-we-think-large-scale-distributed-systems-would-be-easy">goes into great explanation</a> about how Google takes advantage of anycast. Google uses it for its public DNS servers, the all familiar <code>8.8.8.8</code> and <code>8.8.4.4</code>, where someone’s DNS lookup of <code>8.8.8.8</code> in Australia may be routed somewhere different than coming from the US, but still receives the same information. Google configures their applications for Anycast, and it allows for folks in operations to take down one cluster and reroute traffic to another, leaving folks to not have to worry about getting the same data when looking up <code>8.8.8.8</code>. TL;DR: It’s great for load balancing.</p>
<h3 id="dane">DANE</h3>
<p><a href="http://en.wikipedia.org/wiki/DNS-based_Authentication_of_Named_Entities">DANE</a> stands for DNS-based Authentication of Named Entities. It’s a protocol for certificates to be bound to DNS names using <a href="http://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions">DNSSEC</a>. It can be likened to two-factor authentication that we, as users, are familiar with. Essentially, DANE is a <a href="http://tools.ietf.org/html/rfc6698">proposed</a> way to cross-verify the domain name and the CA-issued certificate. <sup><a href="https://community.infoblox.com/blogs/2014/04/14/dns-based-authentication-named-entities-dane">3</a></sup></p>
<p>The issue that DANE solves is the inability to verify that the organization running the web server officially owns the domain name. As well, the DNS record does not contain information regarding which Certificate Authority is preferred by this organization.</p>
<p>Exploits of this weakness was seen twice in 2011 with <a href="https://www.comodo.com/Comodo-Fraud-Incident-2011-03-23.html">Comodo</a> and the Dutch CA, <a href="http://en.wikipedia.org/wiki/Diginotar#Issuance_of_fraudulent_certificates">DigiNotar</a>, where false certificates were generated giving the attackers the ability to perform man-in-the-middle exploits.</p>
<p>So again, what DANE does is provide a way to cross-verify the domain name information with the host’s CA-issued certificate. The pieces of authentication with respect to two-factor auth is:</p>
<ol>
<li>a DNSSEC-authenticated authoritative DNS entry about the valid certificate, and</li>
<li>the actual certificate - or a hash of the certificate - with the valid fully-qualified domain name that can be validated by a trusted CA.</li>
</ol>
<p>DNSSEC is required to be configured on your authoritative DNS server for DANE to be set up properly. With that, you just need to make a TLSA (TLS trust anchor) record with information on the type of certificate used, the hash of the certificate, and the hash function used, among <a href="http://tools.ietf.org/html/rfc6698#section-2.1">other things</a>, like so:</p>
<pre><code data-lang="bash">_443._tcp.www.example.com. IN TLSA ( 0 0 1 91751cee0a1ab8414400238a761411daa29643ab4b8243e9a91649e25be53ada )
</code></pre>
<p>For funsies, let’s take a look at one of the <a href="http://www.internetsociety.org/deploy360/resources/dane-test-sites/">available DANE test sites</a>:</p>
<pre><code data-lang="bash">$ dig -t TLSA _443._tcp.www.fedoraproject.org
; <<>> DiG 9.8.3-P1 <<>> -t TLSA _443._tcp.www.fedoraproject.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51776
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;_443._tcp.www.fedoraproject.org. IN TLSA
;; ANSWER SECTION:
_443._tcp.www.fedoraproject.org. 299 IN TLSA 0 0 1 19400BE5B7A31FB733917700789D2F0A2471C0C9D506C0E504C06C16 D7CB17C0
;; Query time: 115 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Jul 21 14:26:39 2014
;; MSG SIZE rcvd: 96
</code></pre>
<p>As of writing this post, <a href="http://www.dnspython.org/">dnspython</a> supports DANE with the ability to <a href="https://github.com/rthalley/dnspython/commit/07426738e9491660214d3d7f39b1cda57284eaba#diff-02f0b547c2779d25cff89672135f20e3">create and manage TSLA resource records</a>, and <a href="https://twistedmatrix.com/trac/">Twisted</a> is <a href="http://twistedmatrix.com/pipermail/twisted-python/2013-November/027773.html">currently working</a> on EDNS and DNSSEC support, with the <a href="https://twistedmatrix.com/trac/wiki/EDNS0#DANE">goal of including DANE</a>.</p>
<h3 id="service-discovery">Service Discovery</h3>
<p>Another nerdy nugget of awesomeness that I uncovered during my deep dive into DNS is that it can be used for service discovery.</p>
<p>There are a few ways and tools to implement service discovery, but it ultimately boils down to the question, “What servers run this service?” As mentioned, one can leverage DNS to help us answer this question with the use of <code>SRV</code> records. <code>SRV</code> records within DNS zones map canonical names, typically in the form of <code>_name._protocol.site</code>, to hostnames.</p>
<p>For instance, Spotify leverages the service lookup ability. Each service has its own SRV record, with one record canonically named after the service itself. When you spin up a Spotify client, it does an SRV lookup, similar to this <code>dig</code> command:</p>
<pre><code data-lang="bash">dig +short _spotify-client._tcp.spotify.com SRV
10 12 4070 AP1.spotify.com.
10 12 4070 AP2.spotify.com.
10 12 4070 AP3.spotify.com.
10 12 4070 AP4.spotify.com.
</code></pre>
<p>The service look up continues on, since user clients connect to an access point, for example <code>AP1.spotify.com</code>, and then the access point resolves the service that the client is looking for, e.g. <code>user</code> service for the user’s profile information:</p>
<p><img class="displayed" title="Spotify's Access Point Service Discovery" alt="Spotify's Access Point Service Discovery" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.001.jpg"/></p>
<h3 id="dht-ring">DHT Ring</h3>
<p>The last little nugget I discovered is the ability to store a DHT ring within DNS.</p>
<p><a href="http://en.wikipedia.org/wiki/Distributed_hash_table">DHT</a> stands for Distributed Hash Tale. It basically gives you a dictionary-like interface, or a key-value store, but the data or nodes are distributed among a network.</p>
<p>Looking at Spotify again, we store some service configuration data in a DHT ring within DNS TXT records.</p>
<p>So for example, when you are on the Spotify client, and want to play a particular song named “foobar” (one that has not yet been locally cached on your machine), the client performs a lookup. When it does, the song ID is hashed, which then becomes the key within the DHT ring.</p>
<p><img class="displayed" title="Spotify track hashed" alt="Spotify track hashed" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.007.jpg"/></p>
<p>So that particular key is then looked up within the DHT ring that is stored in DNS. The value associated with that key is essentially the host location of the service where that song and/or its relevant information/metadata is located. So in this case, Instance E owns (9e, c1], which is where this particular Spotify track, foobar, lives, and is mapped to a particular hostname and port.</p>
<p><img class="displayed" title="Spotify track ring" alt="Spotify track ring" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.008.jpg"/></p>
<p>And then Instance E is mapped to a hostname, for example <code>tracks.4301.lon-tracks-a1.lon.spotify.net</code> which would be the machine that houses data on the foobar track.</p>
<p><img class="displayed" title="Spotify foobar track host" alt="Spotify foobar track host" src="http://www.roguelynn.com/assets/images/eli5-dns/dns-diagrams.009.jpg"/></p>
<p>The dummy hostname, <code>tracks.4301.lon-tracks-a1.lon.spotify.net</code>, tells me that this machine hosts information on tracks, can be connected to via port 4301, is located in our London data centers, and is in pod a1.</p>
<p>Confusing, I know – we’re essentially using DNS for a DHT ring to leverage the distributed characteristic of a DNS system.</p>
<h2 id="tldr-dns-is-hard">TL;DR DNS is hard</h2>
<p>I threw a lot at you - DNS by no means is easy to get and understand in a single blog post. And I definitely guarantee you, you will <em>still</em> screw up your deployment configuration again, because DNS is hard. It’s a black box particularly because it’s not easy to debug. It’s not only hard to learn and debug, but it’s hard to limit it to a <a href="https://ep2014.europython.eu/en/schedule/sessions/5/">30 minute talk</a>. Hopefully this write up and <a href="http://pyvideo.org/video/2600/for-lack-of-a-better-nameserver-dns-explained">accompanied video</a> leaves a better understanding of DNS.</p>
Hanging my Red Hat for headphoneshttp://www.roguelynn.com/words/joining-spotify/2013-09-17T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>At the end of the month, I’ll be trading in my Red Hat for a pair of head phones: I’ll be joining <a href="http://open.spotify.com/user/econchick">Spotify</a> in San Francisco as a Partner Engineer to work on third-party integration.</p>
<p>I had a fantastic experience working on <a href="http://freeipa.org">freeIPA</a>, but found the opportunity at Spotify too awesome to pass up. </p>
<p>You’ll still see me at conferences (maybe more so!). But perhaps more exciting is the ability to dedicate some time and effort to PyLadies, the Python community, and push Spotify to contribute more to open source.</p>
<p>I’m super excited to be joining Spotify, a product that I love and a team that I’ve admired ever since I met them at EuroPython in 2012.</p>
<p>What can I say? I wanted more wubwub!</p>
Karmawho.re: Your Reddit Comment Karma Visualizedhttp://www.roguelynn.com/words/karmawhore/2013-09-02T08:34:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Is a lot of your time wasted from <a href="http://www.urbandictionary.com/define.php?term=Derping">derping</a> around on <a href="http://reddit.com">Reddit</a>? Do you find that you have to correct someone when they’re <a href="http://xkcd.com/386/"><strong>wrong</strong> on the internet</a>? Wish that your collected comment karma was worth something?</p>
<p>Well, I can’t trade you anything for your karma. But I can provide you with awesome <a href="http://d3js.org">d3</a> visualizations of it!</p>
<p>Check out <a href="http://www.karmawho.re"><strong>www.karmawho.re</strong></a> to visualize your <strike>wasted time</strike> collected karma from comment posts. Search via your Reddit user name, or click “I’m feeling creepy” to visualize comments of a random user. Try plugging in a few of <a href="http://www.karmawhores.net/">these top comment posters</a> if you find that you don’t post enough on Reddit.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/karmawhore/karmawhore-start.png" style="box-shadow: 0 1px 2px rgba(0,0,0,0.5);"/></p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/karmawhore/karmawhore-time.png"/></p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/karmawhore/karmawhore-totalkarma.png"/></p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/karmawhore/karmawhore-wordcloud.png"/></p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/karmawhore/karmawhore-buckets.png"/></p>
<p>Karmawhore was a labor day weekend project using <a href="http://d3js.org">d3.js</a>, <a href="http://nvd3.org/">nvd3</a>, and <a href="https://github.com/jasondavies/d3-cloud">d3 wordcloud</a> to bring you these awesome graphs.</p>
PRISM-as-a-Service: Not Subject to American Lawhttp://www.roguelynn.com/words/prism-as-a-service/2013-08-08T08:37:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>In the light of <a href="http://en.wikipedia.org/wiki/PRISM_(surveillance_program)">PRISM</a>’s <a href="http://www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us-internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11e2-8845-d970ccb04497_story.html">whistle blowing</a>, many folks, like <a href="http://fredlybrand.com/2013/06/23/an-apology-to-my-european-it-team/">here</a>, <a href="http://www.theregister.co.uk/2013/06/08/what_about_a_us_tech_boycott/">here</a>, and <a href="http://blogs.computerworld.com/cloud-storage/22305/why-prism-kills-cloud">here</a>, are questioning whether the cloud is a viable option.</p>
<p>This write-up, originally presented at <a href="http://pycon.ca">PyCon Canada 2013</a> (<a href="https://speakerdeck.com/roguelynn/prism-as-a-service-not-subject-to-american-law">slides</a>, and <a href="http://pyvideo.org/video/2319/prism-as-a-service-not-subject-to-american-law">video</a>), and updated for <a href="http://python.ie/pycon/2013/">PyCon Ireland</a> (<a href="https://speakerdeck.com/roguelynn/pycon-ireland-keynote-prism-as-a-service">slides</a>), is a look at what PRISM is, how the cloud is affected, and how we can maintain privacy in the cloud.</p>
<h4 id="disclaimer">Disclaimer</h4>
<p>I am not a lawyer, nor have I studied law in anyway. I’m a typical American with the <a href="http://www.archives.gov/exhibits/charters/constitution.html">US Constitution</a> and the <a href="http://www.archives.gov/exhibits/charters/bill_of_rights_transcript.html">Bill of Rights</a> engrained in my soul.</p>
<p>Nor have I had any experience in three-letter-government-agencies-that-are-out-to-get-you, or currently work/have worked for a company involved in PRISM (or at least not that I’m aware of).</p>
<p>This post has no special insight nor conspiracy theories, just painting a story from publicly available research.</p>
<h2 id="overview">Overview</h2><h4 id="what-is-prism">What is PRISM?</h4>
<p>PRISM stands for “Planning Tool for Resource Integration, Synchronization, and Management” and is a clandestine electronic data mining program for the purpose of mass surveillance. PRISM aims to collect metadata that passes through US servers.</p>
<p>In general, the collected data <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">includes</a> email, voice & video chat, videos, photos, stored data, online social networking details, among others.</p>
<h4 id="who-does-it-affect-and-who-is-involved">Who does it affect and who is involved?</h4>
<p>It’s meant to target foreign communications and not specifically or intentionally target US citizens.</p>
<p>Starting with Microsoft in September 2007, PRISM <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">started collecting data</a>. Other companies include Yahoo (2008), Google (2009), Facebook (2009), PalTalk (2009), YouTube (2010), Skype (2011), AOL (2011), and Apple (2012).</p>
<h4 id="how-does-it-work">How does it work?</h4>
<p>From the NSA slides <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">posted</a> on Washington Post, and from the press releases from <a href="http://googleblog.blogspot.com/2013/06/what.html">Google</a>, <a href="https://www.facebook.com/zuck/posts/10100828955847631">Facebook</a>, <a href="http://www.apple.com/apples-commitment-to-customer-privacy/">Apple</a>, <a href="http://www.microsoft.com/en-us/news/Press/2013/Jun13/06-06statement.aspx">Microsoft</a>, and others, the NSA does not have <em>direct</em> access to these companies' servers. For the collection process, the FBI issues a directive to the company, and the company responds by supplying data (or, supposedly, denies or requests more information).</p>
<blockquote>
<p>The FBI uses government equipment on private company property to retrieve matching information from a participating company, such as Microsoft or Yahoo and pass it without further review to the NSA.</p>
<p><a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">Source: Washington Post</a></p>
</blockquote>
<p>The NSA then <a href="http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/">filters the data</a> by type (voice, video, call records, internet records, etc.) followed by foreign versus American subject to reduce the intake of information about Americans.</p>
<h4 id="what-is-xkeyscore">What is XKeyscore?</h4>
<p><a href="http://en.wikipedia.org/wiki/XKeyscore">XKeyscore</a> complements PRISM, and is actually a system/framework for digital network intelligence exploitation revealed by The Guardian on <a href="http://www.theguardian.com/world/2013/jul/31/nsa-top-secret-program-online-data">July 31st, 2013</a>. In the released <a href="http://www.theguardian.com/world/interactive/2013/jul/31/nsa-xkeyscore-program-full-presentation">NSA presentation</a>, XKeyscore is detailed to have over 500-700 servers (numbers conflicting on page 4 and 6 of the <a href="http://www.theguardian.com/world/interactive/2013/jul/31/nsa-xkeyscore-program-full-presentation">document</a>) that allows a federated query system of completely unfiltered data. The system houses collected email addresses, user activity, phone numbers, extracted files, and parsed client-side HTTP traffic, and can allow for real-time interception.</p>
<p>Example queries it supports include:</p>
<ul>
<li>Encrypted word documents from a particular country</li>
<li>PGP usage from a particular country</li>
<li>VPN connections in a particular country, and all data related to those connections to be decrypted (author’s open-ended question: where is XKeystore/NSA getting keys to decrypt data?)</li>
<li>All Excel spreadhseets containing MAC addresses coming out of a particular country</li>
<li>All exploitable machines in a particular country</li>
<li>Email addresses tied to Google Map searches</li>
<li>All documents that reference $Y subject</li>
</ul>
<p>With each search within XKeyscore system, users can <a href="http://static.guim.co.uk/sys-images/Guardian/Pix/audio/video/2013/7/31/1375270036263/KS2-001.jpg">input any email address with a justification</a>, like “target is in Africa” (just a text-input field! Seemingly no approval system!), and the system <a href="http://static.guim.co.uk/sys-images/Guardian/Pix/audio/video/2013/7/31/1375282759484/KS3edit2-001.jpg">returns</a> a list of emails, with users selecting and reading an email (unknown if entire body or just “metadata” of To, From, CC, BCC, Attachements, etc.) within NSA-viewing software.</p>
<p>Alarmingly, a user can also setup ongoing surveillance of someone just as easily. The user is required to select a <a href="http://static.guim.co.uk/sys-images/Guardian/Pix/audio/video/2013/7/31/1375269316245/KS4-001.jpg">“foreign factor”</a>, including “The person is a user of storage media seized outside of the U.S.”. While unclear what qualifies as storage media, this puts into question <em>anyone</em>, including U.S. citizens, who has ever used storage abroad (any of my fellow American citizens use <a href="https://fastmail.fm">Fastmail</a>?).</p>
<h2 id="timeline">Timeline</h2><h4 id="mass-surveillance-nothing-new-and-not-just-us">Mass Surveillance: Nothing New, and not just US</h4>
<script async class="speakerdeck-embed" data-id="e8a985f0e3730130e8c53a22532026d5" data-ratio="1.33333333333333" src="http://www.roguelynn.com/speakerdeck.com/assets/embed.js"></script>
<p><small style="text-align: center;display:block;"><span id="caption"><a href="https://www.eff.org/nsa-spying/timeline">Timeline sources</a></span></small></p>
<h2 id="unanswered-questions">Unanswered Questions</h2>
<p>The lack of detail in both the involved companies' statements and in the NSA slides and responses themselves leave a lot of questions. For instance:</p>
<ul>
<li>How is a target’s “foreignness” determined?</li>
<li>How exactly are they identifying non-US citizens?</li>
<li>Are foreign-born US residents included as foreigners?</li>
<li>What if a US citizen and a foreigner communicate?</li>
<li>What about US citizens abroad, or US citizens using services abroad?</li>
<li><a href="https://news.ycombinator.com/item?id=5860003">Does the NSA/FBI have jurisdiction on foreign persons/companies using services from US-based companies that are located/incorporated abroad</a>? How does physical location of the person/company/service/hardware matter? If a <a href="https://news.ycombinator.com/item?id=5860038">company like Twitter says no</a> to a request, can the NSA go to their PRISM-compliant hosts to get the desired information?</li>
<li>What exactly do works like “backdoor”, “direct access”, and “intentional” mean? Do participating companies not know they are participating, therefore unaware of a “backdoor”? Or are companies knowingly allow access to their servers?</li>
<li>If the NSA is not intentionally targeting Americans, what do they do with the accidently-collected data? or the data received from Five Eyes that includes Americans?</li>
<li>How is the PRISM-collected data handled by the NSA? Does the NSA maintain rigorous security measures to protect against threats?</li>
<li>What exactly is the NSA doing with the data? Are they merely collecting, or analyzing it?</li>
</ul>
<h2 id="effects-on-the-cloud">Effects on the cloud</h2>
<p>The US cloud has started to feel the pain in the wake of PRISM being outed. The <a href="https://cloudsecurityalliance.org/">Cloud Security Alliance</a> conducted a <a href="https://cloudsecurityalliance.org/research/surveys/#_nsa_prism">survey</a> showing that 56% of foreign-based members are less likely to use US-based cloud providers, while 10% have all-out cancelled their contracts with US providers.</p>
<p>Governments themselves are starting to act: Germany has also <a href="http://www.heise.de/newsticker/meldung/PRISM-Datenschuetzer-stoppen-neue-Datentransfers-von-Firmen-in-die-USA-1922987.html">banned</a> (in German) all future transfer of data to non-EU-based clouds.</p>
<p>More recently, the <a href="http://www.itif.org/">Information Technology & Innovation Foundation</a> released a report quantifying <a href="http://www2.itif.org/2013-cloud-computing-costs.pdf">how much PRISM will cost the US computing industry</a> (PDF):</p>
<blockquote>
<p>The U.S. cloud computing industry stands to lose $22 to $35 billion over the next three years as a result of the recent revelations about the NSA’s electronic surveillance programs.</p>
</blockquote>
<p>Folks have <a href="http://www.theregister.co.uk/2013/06/08/what_about_a_us_tech_boycott/">boycotted</a> US cloud, others are <a href="http://www.reddit.com/r/worldnews/comments/1fxg0d/nsa_prism_why_im_boycotting_us_cloud_tech_and_you/caespwn">experiencing</a> a run on their own cloud services, and some are even <a href="http://fredlybrand.com/2013/06/23/an-apology-to-my-european-it-team/">apologizing</a> to foreign co-workers on using US-based cloud services.</p>
<p>Not just on the corporate level, we’re also seeing services that aim for user privacy and anonymity shutting down, including <a href="http://silentcircle.wordpress.com/2013/08/09/to-our-customers/">Silent Circle</a>, <a href="http://lavabit.com/">Lavabit</a> (now in court over <a href="http://www.wired.com/threatlevel/2013/10/lavabit_unsealed/">refusal to hand over SSL keys</a>), <a href="https://www.eff.org/deeplinks/2013/08/dea-and-nsa-team-intelligence-laundering">Silk Road</a>, and <a href="http://www.theguardian.com/world/2013/oct/04/nsa-gchq-attack-tor-network-encryption">Tor</a>.</p>
<h2 id="what-can-we-do">What can we do?</h2>
<p>It’s difficult to defend oneself against an attacker whose capabilities you do not know. We should approach protecting our privacy by asking some questions, defining threat scenarios, and building defenses around that.</p>
<h3 id="limit-government-exposure">Limit government exposure</h3>
<p>There are, in general, three reasons a government may spy: industrial espionage, political reasons, and terrorism</p>
<p>By bringing cloud services within your company’s national jurisdiction, you effectively eliminate industrial espionage by a foreign government. It is very unlikely to be spied on for industrial espionage by your own government.</p>
<p>That said, it’s still unclear if the USA, UK, Europe, etc, can have access to data even if you host services within your own country. However, that starts to require political solutions rather than technical.</p>
<p>In the end, we have no idea what the technical capabilities of these “attackers” are. Random hackers usually do not have scale like governments do. We have to assume government spies can do anything, but then protection costs skyrocket. In general, though, a company’s data is not that important to governments. It’s personal profiles that is a bit part of any kind of PRISM-like scheme.</p>
<h3 id="diy-cloud">DIY Cloud</h3>
<p>Solutions are simply moving your business elsewhere to your own jurisdiction. I can’t answer whether using US company’s services abroad, like <a href="https://blog.heroku.com/archives/2013/4/24/europe-region">Heroku’s european servers</a>, will or will not protect you against the US government.</p>
<p>Being that hardware is getting a lot cheaper, do-it-yourself clouds like <a href="http://www.openstack.org/">OpenStack</a>, <a href="https://www.openshift.com/">OpenShift</a>, <a href="http://owncloud.org/">ownCloud</a>, <a href="http://cloudstack.apache.org/">CloudStack</a>, etc., will give you total knowledge of what is on your machines and total control of your response if and when a hacker (government or not) gets in. I have my own personal cloud thanks to the raspberry pi.</p>
<h3 id="know-your-neighbors">Know your neighbors</h3>
<p>If you do decide to ‘do-it-yourself’ - know your neighbors! Where are your datacenters, where are your servers? Who is next door? [Fastmail](<a href="https://fastmail.fm">https://fastmail.fm</a>, an Australian company that offers email for individuals or businesses, recently <a href="http://blog.fastmail.fm/2013/10/07/fastmails-servers-are-in-the-us-what-this-means-for-you/">posted about their servers being in the US</a>. The purpose of the post was to address privacy concerns, noting that they only subjected to Australian law. But the commendable part of that post was them talking about how despite not being subjected to US law and having encrypted everything from your keyboard to the moon and back, there are other possible ways to get data. Specifically, they state that their “colocation providers” aka the people they share data center space with, “could be compelled to give physical access to [their] servers” as well as simple brute force physical attack.</p>
<p><img class="displayed" alt="obgligatory XKCD post" src="http://imgs.xkcd.com/comics/security.png"/></p>
<p>In general, though, a company’s data is not that important to governments. It’s personal profiles that is a bit part of any kind of PRISM-like scheme.</p>
<h3 id="as-nerds-ourselves">As nerds ourselves</h3>
<p>Now the question is: what level of granularity are you okay with that your government knows? Or coming from a different direction, is it possible to go completely anonymous? I may sound like I’m wearing a tinfoil hat, but bare with me – this is more of a proof of concept that no, it is not possible to go completely anonymous.</p>
<h4 id="location-tracking">Location tracking</h4>
<p>I would imagine most folks are okay with a government knowing <em>when</em> you are inside its nation. But how granular are you okay with in being tracked?</p>
<p>It’s difficult to “go off the grid” entirely. As long as you carry a phone, you’re trackable. If you live in <a href="http://www.aclu.org/technology-and-liberty/new-law-requires-warrants-cell-phone-tracking">Maine</a> or <a href="http://www.thetakeaway.org/2013/jul/22/new-jersey-supreme-court-warrant-required-acquire-cell-phone-location-data/">New Jersey</a>, you’re lucky; but in the majority of the US, a search warrant is not needed to ask your cell phone carrier for your location data. You can certainly turn off your phone, but as soon as you turn it on, location is recorded when connecting to a cell tower or wifi connection spot. In London, a <a href="http://arstechnica.com/security/2013/08/no-this-isnt-a-scene-from-minority-report-this-trash-can-is-stalking-you/">UK company is building smart trash cans that track every WiFi-enabled smartphone that passes by</a>. In a one-week period with just 12 cans installed, more than 4 million devices were tracked.</p>
<p>Sure, you could rotate through pay-as-you-go phones. You’ll have to pay with cash, but it’s unclear if ATMs log bill numbers with account numbers. If you do have unmarked bills, then where do you buy the phone? What store reliably does not have security cameras? The phone has a barcode to scan when purchasing; would logs of purchases be tied to camera footage? You may ask, “would the government really care? Am I that dangerous to be tracked?” Yet in the UK for instance, <a href="http://www.theguardian.com/uk/2011/mar/02/cctv-cameras-watching-surveillance">your every move is already being watched</a>.</p>
<p>In terms of the internet, your internet service provider (ISP) will always know your connections, and can always be intercepted. Even if you are not using a line that is tied to your name, your location can still be identified because of the MAC address associated with your connection.</p>
<p>Perhaps you write a script to change your MAC address after every connection to avoid being tracked. There are other identifying factors: there is <a href="http://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting">OS fingerprinting</a> and <a href="https://panopticlick.eff.org">browser fingerprinting</a>.</p>
<p>It’s not <em>truly</em> clear one can avoid location tracking if connected to the internet at all.</p>
<h4 id="behavior-profiling">Behavior profiling</h4>
<p>Assume that you can avoid being tracked by location. You’re still trackable.</p>
<p>What you do online creates a nice <a href="https://www.eff.org/issues/online-behavioral-tracking">behavior profile</a> on you. When do you go online? Who do you email? What websites do you visit? This behavior is always visible to your ISP because they need to route you.</p>
<p>Even when you flip open an incognito tab with Chrome, it clearly states what information is not being hidden:</p>
<p><img class="displayed" title="Screenshot of incognito tab" alt="Screenshot of incognito tab" src="http://www.roguelynn.com/assets/images/prism-as-a-service/incognito.png"/></p>
<p>One popular way to increase anonymity is using the <a href="http://www.movements.org/how-to/entry/how-to-surf-the-internet-anonymously-with-tor/">Tor</a> project. Note, however, simply using Tor does not encrypt all of your internet activities. However, there are <a href="https://www.torproject.org/download/download.html.en#Warning">certain habits</a> that can be done to increase the effectiveness of Tor.</p>
<p>Tor only gives you anonymity, though. It does <em>not</em> provide privacy. For instance, using Tor allows you to connect to sites without being traced back to you. Privacy protests your data, anonymity protects you. To protect privacy, VPN is your answer. When choosing a VPN, <a href="http://torrentfreak.com/vpn-services-that-take-your-anonymity-seriously-2013-edition-130302/">be sure to do your research</a>. If you are to use both Tor and VPN, <a href="http://www.slideshare.net/slideshow/embed_code/14380693?startSlide=138">be sure you know what you’re doing</a>, or you may end up like <a href="http://en.wikipedia.org/wiki/Sarah_Palin_email_hack">this guy</a>.</p>
<p>The <a href="https://www.eff.org/">EFF</a> has a great <a href="https://www.eff.org/pages/tor-and-https">graph</a> of the potentially visible data to eavesdroppers with Tor, HTTPS, and neither or both.</p>
<p>Yet the status of Tor is now questionable with the owner of “Freedom Hosting” <a href="https://openwatch.net/i/200/">arrested</a> due to a <a href="http://pastebin.com/pmGEj9bV">JavaScript exploit</a> deployed against a Tor-user’s browser. Even if you were to avoid these sorts of exploits, if an individual controls a relatively large percentage of Tor nodes (I can’t find the resource, but I believe about 15% is enough), the owner can de-anonymize you.</p>
<p>Assuming if both your Tor connection and your browser is secure, you’re using a trusty VPN service, and not downloading exploitable files, how can you avoid being profiled? It’s still difficult to hide pure browsing activity (unless you write scripts to simulate fake activity). At this point, not much else can be done.</p>
<p>What about email? PGP/GPG only encrypts the body of the email. Therefore, who you email, any x-headers, routing information, and <em>subject</em> of the email is not encrypted. Perhaps use an obfuscatory or minimal subject line. If you <em>really</em> don’t want folks knowing who you’re emailing, then perhaps a private, self-hosted forum is the way to communicate. How far do you want to go?</p>
<h4 id="encryption">Encryption</h4>
<p>And SSL? Can we trust it? The general status of SSL should be considered unclear on a technical level. With <a href="http://arstechnica.com/business/2011/09/new-javascript-hacking-tool-can-intercept-paypal-other-secure-sessions/">BEAST</a>, <a href="http://www.cert.sd/index928f.html">CRIME</a>, and <a href="http://breachattack.com">BREACH</a> attacks on SSL presented in three consecutive years, is SSL enough? There may be more attacks against SSL that is not public if two dudes are able to publicly present three SSL attacks three years in a row.</p>
<p>Assuming that SSL is technically secure, a <a href="http://arstechnica.com/security/2010/03/govts-certificate-authorities-conspire-to-spy-on-ssl-users/">government can still go to a root Certificate Authority and demand its private key to facilitate man-in-the-middle attacks</a>, or the CA itself <a href="http://www.vasco.com/company/about_vasco/press_room/news_archive/2011/news_diginotar_reports_security_incident.aspx">could be compromised</a>. One way to ensure trusted certificates are being used and avoid MITM attacks is <a href="https://www.owasp.org/index.php/Certificate_and_Public_Key_Pinning">certificate pinning</a>.</p>
<p>One concern with PRISM is that the “<a href="http://news.netcraft.com/archives/2013/06/25/ssl-intercepted-today-decrypted-tomorrow.html">NSA logs very high volumes of internet traffic and retains captured encrypted communication for later cryptanalysis</a>”. <a href="http://en.wikipedia.org/wiki/Perfect_forward_secrecy">Perfect Forward Secrecy</a> is one <a href="http://blogs.computerworld.com/encryption/22366/can-nsa-see-through-encrypted-web-pages-maybe-so">defense</a> against this:</p>
<blockquote>
<p>When PFS is used, the compromise of an SSL site’s private key does not necessarily reveal the secrets of past private communication; connections to SSL sites which use PFS have a per-session key which is not revealed if the long-term private key is compromised. The security of PFS depends on both parties discarding the shared secret after the transaction is complete (or after a reasonable period to allow for session resumption).</p>
<p><a href="http://news.netcraft.com/archives/2013/06/25/ssl-intercepted-today-decrypted-tomorrow.html">Source</a></p>
</blockquote>
<p>Quick note though: PFS does <em>not</em> protect against MITM attacks:</p>
<blockquote>
<p>Someone with access to the server’s private key can, of course, perform an active man in the middle attack and impersonate the server. However, they can do that only when the communication is taking place. It is not possible to pile up a mountain of encrypted traffic and decrypt it later.</p>
<p><a href="https://community.qualys.com/blogs/securitylabs/2013/06/25/ssl-labs-deploying-forward-secrecy">Source</a></p>
</blockquote>
<h2 id="outlook">Outlook</h2>
<p>It essentially comes down to the questions:</p>
<ul>
<li>How do we deal with SSL?</li>
<li>How much can we trust it?</li>
<li>Do we need to “reboot” our encryption protocols and habits entirely?</li>
<li>Should we reevaluate the whole idea of the certificate authority system?</li>
</ul>
<p>Whether the NSA has “direct” access to PRISM-cooperating companies’ servers or has effectively <a href="https://mailman.stanford.edu/pipermail/liberationtech/2013-June/008838.html">hacked cryptographic constructs</a>, our privacy has been challenged. Our data is being collected, and while it may seem trivial now, one day you may be deemed a terrorist like <a href="http://gawker.com/kid-who-rapped-about-marathon-bombing-now-faces-terro-486959354">this kid here</a>.</p>
Part 3: Setting up a Kerberos test environmenthttp://www.roguelynn.com/words/setting-up-a-kerberos-test-environment/2013-05-17T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This is part 3 of a series of posts on setting up Django to use external authentication. This post explains how to setup your own environment to test Django authentication against Apache and Kerberos/Active Directory/LDAP.</p>
<h3 id="setting-up-your-own-test-environment">Setting up your own test environment</h3>
<p>Naturally, you only care about coding and developing. I’ve made a <a href="https://gist.github.com/econchick/99699a6fee2eb44d13b0">Vagrantfile</a> that spins up two VMs: an IPA server with a Kerberos KDC, and a client within the Kerberos realm that runs Apache, both on Fedora 18.</p>
<h4 id="setup-your-kerberos-test-environment">Setup your Kerberos test environment:</h4><pre><code data-lang="bash">$ git clone\
https://gist.github.com/econchick/99699a6fee2eb44d13b0\
KerbTestEnvironment
$ cd KerbTestEnvironment
# for a synced folder between local and Vagrant VM
$ mkdir synergizerApp
# to spin up both machines at the same time:
$ vagrant up
# to spin up machines individually:
$ vagrant up ipaserver
$ vagrant up client
</code></pre><h4 id="using-the-test-environment">Using the test environment</h4>
<p>To use your Kerberos test environment, make sure both VMs are up and running with <code>vagrant status</code>. </p>
<p>First, ssh into the <strong>server</strong> via <code>vagrant ssh ipaserver</code> then check to see if the IPA service is up and running, and if not, start it up:</p>
<pre><code data-lang="bash">[vagrant@ipaserver]$ sudo ipactl status
[vagrant@ipaserver]$ sudo ipactl start
</code></pre>
<p>Be sure you can <code>kinit</code> on the server:</p>
<pre><code data-lang="bash">[vagrant@ipaserver]$ kinit admin
</code></pre>
<p>Now, ssh into the <strong>client</strong> via <code>vagrant ssh client</code>, then check to see if you can <code>kinit</code> to make sure this VM can connect to <code>ipaserver</code>’s KDC:</p>
<pre><code data-lang="bash">[vagrant@client]$ kinit admin
</code></pre>
<p>To push your app to the client VM, you can just copy your Django code to the <code>KerbTestEnvironment/synergizerApp/</code> directory we created earlier, and it will drop into Apache’s default directory, <code>/var/www/</code>. You will need to configure Apache for wsgi. </p>
<p>Then go on with the earlier described <a href="#does-it-negotiate-testing-setup">testing</a>.</p>
<h4 id="possible-issues">Possible issues</h4>
<ul>
<li>If you receive a similar error message during <code>vagrant up $VM_NAME</code>: </li>
</ul>
<blockquote>
<p>The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
/sbin/ifup p7p1 2> /dev/null</p>
</blockquote>
<p>then apply <a href="https://github.com/monvillalon/vagrant/commit/dc9830350a0f2be3bb7a4b4e9fcefaed66c6a26a">this</a> fix within Vagrant’s installation. For my Mac OS X Mountain Lion + Vagrant v1.2.2 (most up-to-date at the time of this article), it was a bit tough to find the exact place where this fix should be made. Wherever the vagrant gems are installed, find <code>plugins/guests/fedora/cap/configure_network.rb</code> to adjust the line that contains this:</p>
<pre><code data-lang="ruby">machine.communicate.sudo("/sbin/ifup p7p#{interface} 2>\
/dev/null")
</code></pre>
<p>to this:</p>
<pre><code data-lang="ruby">machine.communicate.sudo("/sbin/ifup p7p#{interface} 2>\
/dev/null", :error_check => false)
</code></pre>
<p><em>(ya ya, a pull request containing this fix, or rather an update to ifup, should be made; who has time for that…)</em></p>
<ul>
<li><p>If you get a clock skew error during <code>kinit</code> on the <code>ipaserver</code>, restart IPA via <code>sudo ipactl restart</code> and make sure <code>ntpd</code> is running with <code>service ntpd status</code>.</p></li>
<li><p>If you get a clock skew error during <code>kinit</code> on the <code>client</code>, you’ll need to resync NTP. Try the following (you’ll have to do the <code>ntpdate</code> command at least twice to adjust the NTP clock to at most 300 seconds/5 minutes difference):</p></li>
</ul>
<pre><code data-lang="bash">[vagrant@client]$ sudo killall ntpd
[vagrant@client]$ sudo ntpdate ipaserver.example.com
[vagrant@client]$ sudo ntpdate ipaserver.example.com
[vagrant@client]$ kinit admin
</code></pre><h3 id="resources">Resources</h3>
<ul>
<li><a href="https://docs.fedoraproject.org/en-US/Fedora/17/html/FreeIPA_Guide/index.html">Setting up IPA</a></li>
<li><a href="http://freeipa.org/page/HowTos">User-contributed How-tos for IPA</a>, including working with IPA, Interoperability with other systems, and 3rd party implementations.</li>
<li>Part 0: <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">Explain like I’m 5: Kerberos</a></li>
<li>Part 1: <a href="http://www.roguelynn.com/words/django-custom-user-models">Django 1.5 Custom User Models</a></li>
<li>Part 3: <a href="http://www.roguelynn.com/words/setting-up-a-kerberos-test-environment">Setting up a Kerberos test environment</a></li>
</ul>
Part 2: Apache and Kerberos for Django Authentication + Authorizationhttp://www.roguelynn.com/words/apache-kerberos-for-django/2013-05-16T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This is part 2 of a series of posts on setting up Django to use external authentication. This post explains how to setup Apache for Django to use a corporate/internal authentication environment.</p>
<h3 id="how-do-i-apache">How do I Apache?</h3>
<p>Alright, now that my application is done and the custom user is setup, how do I actually <em>hook</em> this into the internal network?</p>
<p>Apache is the <span id="antibuzz">anti-buzzword</span> if you will, but often you don’t have a choice.</p>
<h4 id="configuring-apache">Configuring Apache</h4>
<p>On the host machine that will be running the Apache instance, make sure to install <code>mod_auth_kerb</code> for Kerberos, or <code>mod_authnz_ldap</code> for LDAP.</p>
<p>Within <code>/etc/httpd/conf.d/</code>, create another <code>.conf</code> file, for instance: <code>remote_user.conf</code>. You can also have this configuration within <code>.htaccess</code> in the desired protected directory itself. </p>
<p>Configuration for Kerberos should look similar to:</p>
<pre><code># remote_user.conf or .htaccess
LoadModule auth_kerb_module modules/mod_auth_kerb.so
<Location />
AuthName "DjangoConKerberos"
AuthType Kerberos
KrbMethodNegotiate On
KrbMethodK5Passwd Off
KrbServiceName HTTP/$FQDN
KrbAuthRealms KERBEROS_DOMAIN
Krb5KeyTab /path/to/http.keytab
Require valid-user
Order Deny,Allow
Deny from all
</Location>
</code></pre>
<p>If using LDAP + Basic auth instead of Kerberos, Apache configuration should look similar to:</p>
<pre><code># remote_user.conf or .htaccess
LoadModule authnz_ldap_module modules/mod_authnz_ldap.so
<Location />
AuthName "DjangoConLDAP"
AuthType Basic
AuthBasicProvider ldap
AuthzLDAPAuthoritative Off
AuthLDAPURL ldap://$LDAP_URL:389/CN=...
AuthLDAPBindDN cn=myusername,cn=Users,...
AuthLDAPBindPassword mypassword
Require valid-user
Order Deny,Allow
Deny from all
</Location>
</code></pre>
<p>An important part of setting up Apache for an internal Kerberized system is getting a keytab for Apache. Most likely, you won’t have access to get the needed keytab, and will need to request one from whoever manages the corporate identity system. </p>
<p>However, it’s quite easy to spin up a test environment to see if your configuration is working correctly. I detail how to set one up in <a href="http://www.roguelynn.com/words/setting-up-a-kerberos-test-environment">part 3</a> of this series.</p>
<p>See <a href="http://modauthkerb.sourceforge.net/configure.html">here</a> for more detailed information on the various configuration parameters for <code>mod_auth_kerb</code> + Apache, or <a href="http://httpd.apache.org/docs/2.2/mod/mod_authnz_ldap.html">here</a> for LDAP + Apache configuration, and <a href="http://www.netexpertise.eu/en/apache/authentication-against-active-directory.html">here</a> for Active Directory with LDAP + Apache configuration.</p>
<p>You will need to setup <a href="https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/">wsgi configuration</a> for Apache to actually serve your application. </p>
<p>If, by some <em>odd</em> reason, you want or can use something other than Apache, there is a <a href="https://github.com/fintler/nginx-mod-auth-kerb">mod_auth_kerb setup</a> for nginx. <strong>Disclaimer:</strong> I have not used or tested nginx with Kerberos setup.</p>
<h3 id="does-it-negotiate-testing-setup">Does it negotiate? Testing setup</h3><h4 id="curl">cURL</h4>
<p>I’d suggest to first try with <code>curl</code> to make sure the Apache + Kerberos setup is correct:</p>
<pre><code>[vagrant@client]# kinit roguelynn
Password for roguelynn@ROOTCLOUD.COM:
[vagrant@client]# curl -I --negotiate -u : \
https://synergizeapp.strategery.com
HTTP/1.1 401 Unauthorized
Date: Wed, 15 May 2013 09:10:18 GMT
Server: Apache/2.4.4 (Fedora)
WWW-Authenticate: Negotiate
Content-type text/html; charset=iso-8859-1
HTTP/1.1 200
Date: Wed, 15 May 2013 09:10:18 GMT
Server: Apache/2.4.4 (Fedora)
WWW-Authenticate: Negotiate sOmE_RanDom_T0k3n
...
</code></pre>
<p>The <code>--negotiate</code> flag flips on <a href="http://en.wikipedia.org/wiki/SPNEGO">SPNego</a> for cURL, and the <code>-u :</code> forces cURL to pick up the authenticated user’s cached ticket from <code>kinit</code>’ing. You can see your cached ticket with <code>klist</code>. </p>
<h4 id="requests.py">requests.py</h4>
<p>For <code>requests</code> fans:</p>
<pre><code data-lang="bash">$ kinit USERNAME
$ python
>>> import requests
>>> from requests_kerberos import HTTPKerberosAuth
>>> r = requests.get("$APACHE_PROTECTED_FQDN",\
auth=HTTPKerberosAuth())
>>> r.status_code
200
</code></pre><h4 id="browser">Browser</h4>
<p>First, we’ll need to configure our browser to use Negotiate/SPNego:</p>
<ul>
<li>Safari – just “works”. Thanks, Apple.</li>
<li><p>Chrome</p>
<ul>
<li><p>Mac:</p>
<pre><code>open 'Google Chrome.app' --args\
--auth-server-whitelist="*ROGUECLOUD.COM"\
--auth-negotiate-delegate-whitelist="*KERBEROS_DOMAIN"\
--auth-schemes="basic,digest,ntlm,negotiate"
</code></pre></li>
<li><p>Linux:</p>
<pre><code>google-chrome --enable-plugins --args\
--auth-server-whitelist="*KERBEROS_DOMAIN"\
--auth-negotiate-delegate-whitelist="*KERBEROS_DOMAIN"\
--auth-schemes="basic,digest,ntlm,negotiate"
</code></pre></li>
<li><p>Windows: </p>
<pre><code>chrome.exe --auth-server-whitelist="*KERBEROS_DOMAIN"\
--auth-negotiate-delegate-whitelist="*ROGUECLOUD.COM"\
--auth-schemes="basic,digest,ntlm,negotiate"
</code></pre></li>
<li><p>Firefox</p>
<ul>
<li>Navigate to <code>about:config</code></li>
<li>Search for “negotiate”</li>
<li>For <code>network.negotiate-auth.delegation-uris</code> add <code>.KERBEROS_DOMAIN</code></li>
<li>For <code>network.negotiate-auth.trusted-uris</code> add <code>.KERBEROS_DOMAIN</code></li>
</ul></li>
<li><p>IE</p>
<ul>
<li>Internet Options > Tools > Advanced Tab</li>
<li>Within Security section, select “Enable Integrated Windows Authentication”</li>
<li>Restart browser</li>
</ul></li>
</ul></li>
<li><p>Authenticate yourself with <code>kinit USERNAME</code> within the terminal. </p></li>
<li><p>Finally, navigating to <code>$APACHE_PROTECTED_FQDN</code> within browser should then just work if everything is setup appropriately. </p></li>
<li><p>If prompted for Kerberos username/password, then Apache configuration maybe incorrect if you did not intend that, but should still authenticate with Kerberos credentials</p></li>
</ul>
<h3 id="authentication-vs-authorization">Authentication vs Authorization</h3>
<p>So - I’m sure this isn’t news to anyone: there’s a difference between authentication and authorization. First is who you are, the second is what you can do. </p>
<p>Using RemoteUserBackend and Middleware doesn’t automatically grab what the user is authorized to do; it’s just authentication. </p>
<p>However, if needed, there are ways to hook into the user database to grab permissions.</p>
<h4 id="accessing-permissions">Accessing Permissions</h4>
<p>Say your app needs to know if a user is defined as an admin, or staff, a part of a particular group (e.g. “finance”, “engineering”) or something else that is already defined in your external auth system. Customizing a backend is needed to connect directly to the external user datastore.</p>
<p>Typically, LDAP holds users in groups, through <code>memberOf</code> parameter or something similar. By binding to the LDAP to find what group the user is a member of, you can then define what authorization that a user has within your own app logic. e.g. if user is a member of “admins”, then create_superuser(user). </p>
<ul>
<li><a href="http://pythonhosted.org/django-auth-ldap">django-auth-ldap</a> and/or <a href="https://code.google.com/p/django-ldap-groups">django-ldap-groups</a> can be dropped into your Django app</li>
<li>Or one of these snippets (they are focused on Active Directory but so long as the configuration variables within your <code>settings.py</code> are correct, it should work with a standard LDAP or IPA setup):
<ul>
<li><a href="http://djangosnippets.org/snippets/501/">Active Directory</a></li>
<li><a href="http://djangosnippets.org/snippets/901/">Active Directory over SSL</a></li>
<li><a href="http://djangosnippets.org/snippets/1397/">Active Directory/LDAP</a></li>
<li><a href="http://djangosnippets.org/snippets/2899/">Active Directory with Groups</a></li>
</ul></li>
</ul>
<p>In a Kerberized environment, one approach may to be just getting another keytab for accessing the LDAP, although that requires the service to have wide read privileges. </p>
<p>How I would approach it though, at least within an IPA environment (which stores its user information in an LDAP), is to capitalize on IPA’s use of <a href="https://fedorahosted.org/sssd/">SSSD - System Security Services Daemon</a>. You can execute local system calls, like <code>getent group $USERNAME</code> and SSSD grabs the group the user is a member of.</p>
<h3 id="resources">Resources</h3>
<ul>
<li><a href="http://www.sensibledevelopment.com/2011/01/a-generic-wsgi-file-for-deploying-django-with-virtualenv-and-mod_wsgi/">Help with generic setup with django, virtualenv, and Apache’s mod_wsgi</a></li>
<li>Part 0: <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">Explain like I’m 5: Kerberos</a></li>
<li>Part 1: <a href="http://www.roguelynn.com/words/django-custom-user-models">Django 1.5 Custom User Models</a></li>
<li>Part 3: <a href="http://www.roguelynn.com/words/setting-up-a-kerberos-test-environment">Setting up a Kerberos test environment</a></li>
</ul>
Part 1: Django 1.5 Custom User Modelshttp://www.roguelynn.com/words/django-custom-user-models/2013-05-16T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This is part 1 of a series of posts on setting up Django to use external authentication. This post explains how to setup Django with custom user models for corporate/internal authentication methods.</p>
<h3 id="intro">Intro</h3>
<p>Everyone has or has had a Pointy-haired boss or client, right? micromanagement, incompetence, unaware? Maybe you’re lucky?</p>
<p>So your Pointy-haired boss/client needs an web application. Perhaps it’s an internal web app that supposed to capitalize on synergy, streamline costs, leverage assets, all those other effing <span id="buzzword">buzzwords</span>.</p>
<p>You hope to use postgres and Django’s default auth mechanism, but no – you have to use the corporate/internal authentication system, a.k.a - <a href="http://en.wikipedia.org/wiki/Single_sign-on">single sign-on</a>. We’re trying to avoid managing separate user credentials and needing to login to the required <span id="buzzword">mission-critical synergy</span> app.</p>
<p>Not to despair, my Djangonauts - you can <span id="buzzword">leverage</span> Django’s new custom user models!</p>
<h3 id="problem-make-an-internal-web-app">Problem: Make an internal Web App</h3>
<p>So an overview of problem:</p>
<ul>
<li>need to integrate into internal authentication like Kerberos, LDAP, Active Directory</li>
<li>can’t use Postgres for authentication - BUMMER</li>
<li>leverage single sign-on within your app</li>
</ul>
<h3 id="crap.-what-is-this-single-sign-on-magic">Crap. What is this single sign-on magic?!</h3>
<ul>
<li>Enter: the new custom user model introduced in <a href="https://docs.djangoproject.com/en/dev/releases/1.5/#configurable-user-model">Django 1.5</a>
<ul>
<li>allows for a different identifier than the basic User Model with username greater than 30 chars</li>
<li>username can be email, twitter, etc, or add those elements as requirements</li>
<li>great for Kerberos/LDAP/Active Directory authentication because often the username for those identity management systems is similar to email, <code>username@INTERNAL_DOMAIN</code></li>
</ul></li>
</ul>
<h3 id="scenario">Scenario</h3>
<p>Let’s create a dummy application: <code>./manage.py startapp synergizerApp</code>. Just for the sake of simplicity, this is just a single django project with a single app.</p>
<h4 id="creating-your-custom-user-model">Creating your custom user model</h4>
<p>While you’re hooking into a pre-defined user database that will take care of authentication, and perhaps authorization, you can still define your own custom model by inheriting from <code>AbstractUserBase</code> with your own additions, like so:</p>
<pre><code data-lang="python"># synergizerApp/models.py
from django.contrib.auth.models import AbstractBaseUser
from django.db import models
class KerbUser(AbstractBaseUser):
username = models.CharField(max_length=254, unique=True)
first_name = models.CharField(max_length=30, blank=True)
last_name = models.CharField(max_length=30, blank=True)
email = models.EmailField(blank=True)
synergy_level = models.IntegerField()
is_team_player = models.BooleanField(default=False)
USERNAME_FIELD = 'username'
REQUIRED_FIELDS = ['email', 'synergy_level']
</code></pre>
<p>Because you defined a custom user model – requiring a <code>synergy_level</code> for the user – you’ll need to define a user manager to take care of creating users & superusers within Django. </p>
<p>The key parts here are just defining what a user/superuser should have, and referring to the UserManager within the user model itself.</p>
<pre><code data-lang="python">from django.contrib.auth.models import (
AbstractBaseUser, BaseUserManager)
from django.db import models
class KerbUserManager(BaseUserManager):
def create_user(self, email, synergy_level,
password=None):
user = self.model(email=email,
synergy_level=synergy_level)
# <--snip-->
return user
def create_superuser(self, email, synergy_level,
password):
user = self.create_user(email, synergy_level,
password=password)
user.is_team_player = True
user.save()
return user
class KerbUser(AbstractBaseUser):
username = models.CharField(max_length=254, ...)
# <--snip-->
objects = KerbUserManager()
</code></pre>
<p>Within your custom user model, <code>KerbUser</code>, you will also need to define <code>get_full_name</code> and <code>get_short_name</code>, and <code>is_active</code> which defaults to <code>True</code>.</p>
<p>Just a few variables should be set within <code>settings.py</code> file to make Django a <span id="buzzword">team player</span>:</p>
<pre><code data-lang="python"># settings.py
# <--snip-->
AUTH_USER_MODEL = 'synergizerApp.KerbUser'
MIDDLEWARE_CLASSES = (
...
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.RemoteUserMiddleware',
...
)
AUTHENTICATION_BACKENDS = (
'django.contrib.auth.backends.RemoteUserBackend',
)
# <--snip-->
</code></pre>
<p><strong>Note:</strong> The order of middleware is very important: the <code>AuthenticationMiddleware</code> must precede <code>RemoteUserMiddleware</code>.</p>
<p>If want to use <code>user</code> from the Kerberos' <code>user@REALM</code> as the username, you can simply extend <code>RemoteUserBackend</code>:</p>
<pre><code data-lang="python"># synergizerApp/krb5.py
from django.contrib.auth.backends import RemoteUserBackend
class Krb5RemoteUserBackend(RemoteUserBackend):
def clean_username(self, username):
# remove @REALM from username
return username.split("@")[0]
</code></pre>
<p>and <code>settings.py</code> for your custom backend defined above:</p>
<pre><code data-lang="python"># settings.py
# <--snip-->
AUTHENTICATION_BACKENDS = (
'appname.krb5.Krb5RemoteUserBackend',
)
# <--snip-->
</code></pre>
<p>To access the user within the models for your application, you’ll refer to the custom user model like so:</p>
<pre><code data-lang="python"># synergizerApp/models.py
from django.conf import settings
from django.db import models
class Synergy(models.Model):
money_sink = models.ForeignKey(settings.AUTH_USER_MODEL)
# <--snip-->
</code></pre>
<p>To access within your views:</p>
<pre><code data-lang="python"># synergizerApp/views.py
from django.contrib.auth import get_user_model
User = get_user_model()
# <--snip-->
</code></pre><h3 id="other-resources">Other resources</h3>
<ul>
<li>Part 0: <a href="http://www.roguelynn.com/words/explain-like-im-5-kerberos/">How Kerberos Works</a></li>
<li>Part 2: <a href="http://www.roguelynn.com/words/apache-kerberos-for-django">Apache and Kerberos for Django Authentication + Authorization</a></li>
<li>Part 3: <a href="http://www.roguelynn.com/words/setting-up-a-kerberos-test-environment">Setting up a Kerberos test environment</a></li>
<li><a href="http://procrastinatingdev.com/django/using-configurable-user-models-in-django-1-5/">Using configurable user models in Django</a></li>
</ul>
Explain like I’m 5: Kerberoshttp://www.roguelynn.com/words/explain-like-im-5-kerberos/2013-04-02T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Explain like I’m 5 years old: Kerberos – what is Kerberos, and why should I care?</p>
<p>While this topic probably can not be explained to a 5 year-old and be understood, this is my attempt at defragmenting documentation with some visual aids and digestible language.</p>
<h3 id="in-a-nutshell">In a nutshell</h3>
<p>Basically, Kerberos comes down to just this:</p>
<ul>
<li>a protocol for authentication</li>
<li>uses tickets to authenticate</li>
<li>avoids storing passwords locally or sending them over the internet</li>
<li>involves a trusted 3rd-party</li>
<li>built on symmetric-key cryptography</li>
</ul>
<p>You have a <strong>ticket</strong> – your proof of identity encrypted with a <span id="secret-key">secret key</span> for the particular service requested – on your local machine (creation of a ticket is described below); so long as it’s valid, you can access the requested service that is within a Kerberos realm.</p>
<p>Typically, this is used within corporate/internal environments. Perhaps you want to access your internal payroll site to review what little bonus your boss has given you. Rather than re-entering your user/password credentials, your ticket (cached on your system) is used to authenticate allowing for single sign-on.</p>
<p>Your ticket is refreshed when you sign on to your computer, or when you <code>kinit USER</code> within your terminal.</p>
<p>For the trivia-loving folks, Kerberos’ name comes from <a href="http://l.ynn.me/Zxcew1">Greek mythology</a>, the three-headed guard dog of <a href="http://en.wikipedia.org/wiki/Hades">Hades</a>. It’s pretty fitting since it takes a third-party (a Key Distribution Center) to authenticate between a client and a service or host machine.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Hades-et-Cerberus-III.jpg" width="300" height="450"/>
<figcaption><a href="http://en.wikipedia.org/wiki/File:Hades-et-Cerberus-III.jpg">Wikipedia</a></figcaption></p>
<h3 id="kerberos-realm">Kerberos Realm</h3>
<p>Admins create realms – Kerberos realms – that will encompass all that is available to access. Granted, <strong>you</strong> may not have access to certain services or host machines that is defined within the policy management – developers should not access anything finance related, stuff like that. But a realm defines what Kerberos manages in terms of who can access what.</p>
<p>Your machine, the Client, lives within this realm, as well as the service or host you want to request and the Key Distribution Center, KDC (no, not the <a href="http://en.wikipedia.org/wiki/KGB">KGB</a>, although I always think of that, too). In the following example, I separate out the Authentication Server and the Ticket Granting Server, but both are within the KDC.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.001.jpg" width="630" height="475" title="Kerberos Realm" alt="Kerberos Realm"/></p>
<h3 id="to-keep-in-the-back-of-your-mind">To keep in the back of your mind</h3>
<p>You may want to come back up here after you read through the gritty details on how the example works.</p>
<p>When requesting access to a service or host, three interactions take place between you and:</p>
<ul>
<li>the Authentication Server</li>
<li>the Ticket Granting Server</li>
<li>the Service or host machine that you’re wanting access to.</li>
</ul>
<p>Other important points:</p>
<ul>
<li>With each interaction, you’ll receive two messages. Each message is one that you can decrypt, and one that you can not.</li>
<li>The service or machine you are requesting access to <strong>never</strong> communicates directly with the KDC.</li>
<li>The KDC stores all of the secret keys for user machines and services in its database.</li>
<li>Secret keys are passwords plus a salt that are hashed – the <a href="http://web.mit.edu/kerberos/krb5-current/doc/admin/conf_files/kdc_conf.html#encryption-and-salt-types">hash algorithm</a> is chosen during implementation of the Kerberos setup. For services or host machines, there are no passwords (who would enter it). A key is actually generated by an admin during initial setup and memorized on the service/host machine.</li>
<li><strong>Again</strong>, these secret keys are all stored in the KDC database; <a href="#in-a-nutshell">recall</a> the Kerberos’ reliance on symmetric-key cryptography.</li>
<li>The KDC itself is encrypted with a master key to add a layer of difficulty from stealing keys from the database.</li>
<li>There are Kerberos <a href="http://k5wiki.kerberos.org/wiki/Pkinit_configuration">configurations</a> and <a href="http://freeipa.org">implementations</a> that use public-key cryptography instead of symmetrical key encryption.</li>
</ul>
<p><em>An aside:</em> the order of the messages and their contents discussed here does not reflect the order in which they are sent over TCP or UDP.</p>
<p>The example below describes what happens when you request something from an internal HTTP Service – like information regarding payroll within your corporate intranet.</p>
<h3 id="you-and-the-authentication-server">You and the Authentication Server</h3>
<p>You want to access an HTTP Service, but first you must introduce yourself to the Authentication Server. Logging into your computer, or <code>kinit USERNAME</code>, initiates that introduction via a plaintext request for a Ticket Granting Ticket (TGT). The plaintext message contains:</p>
<ul>
<li>your name/ID</li>
<li>the name/ID of the requested service (in this case, service is the Ticket Granting Server),</li>
<li>your network address (may be a list of IP addresses for multiple machines, or may be null if wanting to use on any machine), and</li>
<li>requested lifetime for the validity of the TGT,</li>
</ul>
<p>and is sent to the Authentication Server.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.002.jpg" width="630" height="475" title="Request to AS" alt="Request to AS"/></p>
<p>The Authentication Server will check if you are in the KDC database. This check is only to see if you exist; no credentials are checked.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.003.jpg" width="630" height="475" title="AS DB check" alt="AS DB check"/></p>
<p>If there are no errors (e.g. user is not found), it will randomly generate a key called a <span id="tgs-session-key">session key</span> for use between you and the Ticket Granting Server (TGS).</p>
<p>The Authentication Server will then send two messages back to you. One message is the TGT that contains:</p>
<ul>
<li>your name/ID,</li>
<li>the TGS name/ID,</li>
<li>timestamp,</li>
<li>your network address (may be a list of IP addresses for multiple machines, or may be null if wanting to use on any machine)</li>
<li>lifetime of the TGT (could be what you initially requested, lower if you or the TGS’s secret keys are about to expire, or another limit that was implemented during the Kerberos setup), and</li>
<li><font id="tgs-session-key">TGS Session Key</font>,</li>
</ul>
<p>and is encrypted with the <span id="tgs-secret-key">TGS Secret Key</span> . The other message contains:</p>
<ul>
<li>the TGS name/ID,</li>
<li>timestamp,</li>
<li>lifetime (same as above), and</li>
<li><font id="tgs-session-key">TGS Session Key</font></li>
</ul>
<p>and is encrypted with your <span id="client-secret-key">Client Secret Key</span>. Note that the <span id="tgs-session-key">TGS Session Key</span> is the shared key between you and the TGS.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.004.jpg" width="630" height="475" title="AS Response to Client" alt="AS Response to Client"/></p>
<p>Your <span id="client-secret-key">Client Secret Key</span> is determined by prompting you for your password, appending a salt (made up of <code>user@REALMNAME.COM</code>) and hashing the whole thing. Now you can use it for decrypting the second message in order to obtain the <span id="tgs-session-key">TGS Session Key</span>. If the password is incorrect, then you will not be able to decrypt the message. Please note that this is the step in which the password you enter is implicitly validated.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.005.jpg" width="630" height="475" title="Client Decrypt Response" alt="Client Decrypt Response"/></p>
<p>You can not, however, decrypt the TGT since you do not know the <span id="tgs-secret-key">TGS Secret Key</span>. The encrypted TGT is stored within your credential cache.</p>
<h3 id="you-and-the-ticket-granting-server">You and the Ticket Granting Server</h3>
<p>At this point, you have the TGT that you can not read because you do not have the <span id="tgs-secret-key">TGS Secret Key</span> to decrypt it. You do, however, have the <span id="tgs-session-key">TGS Session Key</span>.</p>
<p>It’s now your turn to send two messages. You first prepare the Authenticator, encrypted with the <span id="tgs-session-key">TGS Session Key</span>, containing:</p>
<ul>
<li>your name/ID, and</li>
<li>timestamp.</li>
</ul>
<p>You send an unencrypted message that contains:</p>
<ul>
<li>the requested HTTP Service name/ID you want access to, and</li>
<li>lifetime of the Ticket for the HTTP Service,</li>
</ul>
<p>along with the encrypted Authenticator and TGT to the Ticket Granting Server.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.006.jpg" width="630" height="475" title="Client messages to TGS" alt="Client messages to TGS"/></p>
<p>The Ticket Granting Server will first check the KDC database to see if the HTTP Service exists.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.007.jpg" width="630" height="475" title="TGS DB check" alt="TGS DB check"/></p>
<p>If so, the TGS decrypts the TGT with its <span id="secret-key">Secret Key</span> . Since the now-unencrypted TGT contains the <span id="tgs-session-key">TGS Session Key</span>, the TGS can decrypt the Authenticator you sent.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.008.jpg" width="630" height="475" title="TGS Decrypts messages" alt="TGS Decrypts messages"/></p>
<p>The TGS will then do the following:</p>
<ul>
<li>compare your client ID from the Authenticator to that of the TGT</li>
<li>compare the timestamp from the Authenticator to that of the TGT (typical Kerberos-system tolerance of difference is 2 minutes, but can be configured otherwise)</li>
<li>check to see if the TGT is expired (the lifetime element),</li>
<li>check that the Authenticator is not already in the TGS’s cache (for avoiding replay attacks), and</li>
<li>if the network address in the original request is not null, compares the source’s IP address to your network address (or within the requested list) within the TGT.</li>
</ul>
<p>The Ticket Granting Server then randomly generates the <span id="http-session-key">HTTP Service Session Key</span>, and prepares the HTTP Service ticket for you that contains:</p>
<ul>
<li>your name/ID,</li>
<li>HTTP Service name/ID,</li>
<li>your network address (may be a list of IP addresses for multiple machines, or may be null if wanting to use on any machine),</li>
<li>timestamp,</li>
<li>lifetime of the validity of the ticket, and</li>
<li><font id="http-session-key">HTTP Service Session Key</font>,</li>
</ul>
<p>and encrypts it with the <span id="http-secret-key">HTTP Service Secret Key</span>.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.009.jpg" width="630" height="475" title="TGS responds to Client" alt="TGS responds to Client"/></p>
<p>Then the TGS sends you two messages. One is the encrypted HTTP Service Ticket; the other contains:</p>
<ul>
<li>HTTP Service name/ID,</li>
<li>timestamp,</li>
<li>lifetime of the validity of the ticket, and</li>
<li><font id="http-session-key">HTTP Service Session Key</font>,</li>
</ul>
<p>that is encrypted with the <span id="tgs-session-key">TGS Session Key</span>.</p>
<p>Your machine decrypts the latter message with the <span id="tgs-session-key">TGS Session Key</span> that it cached earlier to obtain the <span id="http-session-key">HTTP Service Session Key</span>.</p>
<p>Your machine can not, however, decrypt the HTTP Service Ticket since it’s encrypted with the <span id="http-secret-key">HTTP Service Secret Key</span>.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.010.jpg" width="630" height="475" title="Client Decrypt TGS messages" alt="Client Decrypt TGS messages"/></p>
<h3 id="you-and-the-http-service">You and the HTTP Service</h3>
<p>To now access the HTTP Service, your machine prepares another Authenticator message that contains:</p>
<ul>
<li>your name/ID,</li>
<li>timestamp,</li>
</ul>
<p>and is encrypted with the <span id="http-session-key">HTTP Service Session Key</span>. Your machine then sends the Authenticator and the still-encrypted HTTP Service Ticket received from the TGS.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.011.jpg" width="630" height="475" title="Client messages HTTP Service" alt="Client messages HTTP Service"/></p>
<p>The HTTP Service then decrypts the Ticket with its <span id="http-secret-key">Secret Key</span> to obtain the <span id="http-session-key">HTTP Service Session Key</span>. It then uses that <span id="http-session-key">Session Key</span> to decrypt the Authenticator message you sent.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.012.jpg" width="630" height="475" title="HTTP Decrypts messages" alt="HTTP Decrypts messages"/></p>
<p>Similar to the TGS, the HTTP Server will then do the following:</p>
<ul>
<li>compares your client ID from the Authenticator to that of the Ticket,</li>
<li>compares the timestamp from the Authenticator to that of the Ticket (typical Kerberos-system tolerance of difference is 2 minutes, but can be configured otherwise),</li>
<li>checks to see if the Ticket is expired (the lifetime element),</li>
<li>checks that the Authenticator is not already in the HTTP Server’s cache (for avoiding replay attacks), and</li>
<li>if the network address in the original request is not null, compares the source’s IP address to your network address (or within the requested list) within the Ticket.</li>
</ul>
<p>The HTTP Service then sends an Authenticator message containing its ID and timestamp in order to confirm its identity to you and is encrypted with the <span id="http-session-key">HTTP Service Session Key</span>.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.013.jpg" width="630" height="475" title="HTTP responds with Auth" alt="HTTP responds with Auth"/></p>
<p>Your machine reads the Authenticator message by decrypting with the cached <span id="http-session-key">HTTP Service Session Key</span>, and knows that it has to receive a message with the HTTP Service’s ID and timestamp.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.014.jpg" width="630" height="475" title="Client Decrypt HTTP Auth" alt="Client Decrypt HTTP Auth"/></p>
<p>And now you have been authenticated to use the HTTP Service. Future requests use the cached HTTP Service Ticket, so long as it has not expired as defined within the lifetime attribute.</p>
<p>While I will write on this later, the HTTP Service itself must be able to support Kerberos. As well, you must also have a browser that supports <a href="http://www.ietf.org/rfc/rfc4559.txt">SPNEGO/Negotiate</a>.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/eli5-kerberos/Kerb.015.jpg" width="630" height="475" title="Authed" alt="Authed"/></p>
<p>Perhaps re-read <a href="#to-keep-in-the-back-of-your-mind">the points</a> previously outlined; check out <a href="http://www.h5l.org/">this</a> or <a href="http://www.gnu.org/software/shishi/">this</a> current implementation, especially <a href="http://freeipa.org">the one on which I am paid to work</a> that communicates with <a href="http://technet.microsoft.com/en-us/library/bb742516.aspx">this popular implementation</a>; or review a <a href="http://www.kerberos.org/software/tutorial.html">tutorial</a>, <a href="http://content.hccfl.edu/pollock/AUnixSec/MoronsGuideToKerberos.htm">resource guide</a>, the go-to <a href="http://www.youtube.com/watch?v=7-LjpO2nTJo">video</a> that was sent to me when I started learning about Kerberos, or the <a href="http://www.ietf.org/rfc/rfc4120.txt">RFC itself</a>.</p>
<p>The above images were rendered with Keynote with icons used from <a href="http://fortawesome.github.com/Font-Awesome/">font awesome</a> and <a href="http://glyphicons.com/">glyphicons</a>, and are available on <a href="https://www.slideshare.net/roguelynn/kerberos-slides/">slideshare</a>.</p>
<p id="break">〜</p>
<p>In a future posts, I’ll write up how to actually setup a Kerberos realm, setting up an HTTP Service that will accept Negotiate authentication, and writing a web application that can plug into Kerberos for its authentication.</p>
<p><strong>Update: May 16th, 2013:</strong> My <a href="http://www.roguelynn.com/circus">post</a> on setting up a web application + Apache for Kerberos, along with creating your own Kerberos test environment.</p>
The New Coder: A Path to Software Engineeringhttp://www.roguelynn.com/words/The-New-Coder-A-path-to-Software-Engineering/2013-03-15T09:47:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This post contains all the sources and thoughts behind my PyCon 2013 talk launching <a href="http://newcoder.io">newcoder.io</a>. The <a href="https://www.youtube.com/watch?v=5hBMlTFfOJg">video</a> and <a href="https://speakerdeck.com/pyconslides/sink-or-swim-5-life-jackets-to-throw-to-the-new-coder-by-lynn-root">slides</a> are available.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-slide1.png" width="370" height="300")/> </p>
<p>If the influx of <a href="http://hackbright.com">private</a>, <a href="http://hackerschool.com">independent</a> <a href="http://www.appacademy.io">schools</a> for learning how to <a href="http://devbootcamp.com">code</a> is any indication, the ever-growing <a href="http://www.mooc-list.com/">list</a> of <a href="http://en.wikipedia.org/wiki/Massive_open_online_course">moocs</a>, or the <a href="http://www.bloomberg.com/news/2011-06-14/pandora-media-raises-234-9-million-in-ipo-after-pricing-stock-above-range.html">financial</a> <a href="http://online.wsj.com/article/SB10001424052748704816604576333132239509622.html">smack</a> in the <a href="http://stream.wsj.com/story/facebook-ipo/SS-2-9640/">face</a> is a tell-tale sign: software engineering as a career is having its 15 minutes of fame. Looking at the pace of its overall growth in terms of jobs and earnings, it’s not too far from <a href="http://en.wikipedia.org/wiki/Moore's_law" title="Moore's law">Moore’s Law</a>.</p>
<p>But in the context of history, software engineering is merely just going through its industrial phase compared to other engineering fields. There’s no revolutionary way to build a bridge or a highway; we’ve perfected the assembly line; making a guitar or violin is now more of an art than a science. </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-writing-classes.png" width="370" height="300")/> </p>
<p>Yet we’re still <a href="http://www.youtube.com/watch?v=o9pEzgHorH0" title="Write fewer classes">arguing</a> how best to <a href="http://lucumr.pocoo.org/2013/2/13/moar-classes/" title="Write more classes">approach</a> software engineering problems and rapidly <a href="http://docs.topazruby.com/en/latest/" title="Topaz Ruby">developing</a> new <a href="http://www.rust-lang.org/" title="Rust Programming Language">languages</a>.</p>
<h2 id="learning-how-to-code-in-our-industrial-revolution">Learning how to code in our industrial revolution</h2>
<p>A textbook and a classroom is fine for learning more established engineering fields like architecture or manufacturing, but with revolutionary pace of software development, how best should we teach its principles? A four-year degree doesn’t merely grant you “l33t” status, certainly not by <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.9130" title="Learning and teaching programming: A review and discussion">researchers’ standards</a>: </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-10-years.png" width="370" height="300")/> </p>
<p>One predominant theme I am seeing is that gap from learning syntax to becoming a passable junior developer. You went through <a href="http://learnpythonthehardway.com" title="LPTHW">Learn Python the Hard Way</a>, uh, now what? Sure – a for-loop is easy to identify, to code when told exactly what to do, but that’s not real life.</p>
<p>There’s <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.924" title="Constructivism in Computer Science">plenty</a> of <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.4923" title="Learning Programming by Solving Problems">debate</a> on <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.6748" title="Students learn cs in different ways. insights from an empirical study">how</a> to go about <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.5278" title="Why Complicate Things? Introducing Programming in High School Using Python">learning</a> how to code. One major underlying theme among what I’ve both read and experienced is the notion of <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.6748" title="Students learn cs in different ways. insights from an empirical study"><strong>concepts for granted</strong></a> versus <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.6748" title="Students learn cs in different ways. insights from an empirical study"><strong>concepts for context</strong></a>:</p>
<blockquote>
<p>We’re less likely to develop an advanced understanding of what we’re trying to master, than if we’re searching for an underlying meaning, trying to integrate the newly-learned concepts into what we already know.</p>
</blockquote>
<p>Learning from lectures and textbooks only go so far. With typical formal education, one learns that certain computer science concepts exist, like how to compile and install an OS, go deeper into how compiling or installing works, or why a particular code base does not work. But it does not necessarily bridge outside of those concepts to understand its application or how the whole works together with its parts.</p>
<p>We need something to create those connections outside of those isolated concepts; to build ourselves context while learning; give personal meaning while striving for understanding as a whole. We <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.924" title="Constructivism in Computer Science">construct</a> our learning rather than simply listen and regurgitate what is taught. So how can we apply that to learning how to code? </p>
<h2 id="project-based-learning">Project-based learning</h2>
<p>I started off blowing bubbles - I did take one course in computer science (after I graduated); I would have failed if it weren’t for my crappy little final <a href="22">project</a> shadowing the fact that I couldn’t code.</p>
<p>But I love it – the cold, chilly, 3am night never looked better to me. But who wants to pay $2000 for a single course whose credit won’t necessarily apply towards anything. I was drowning while from being told “this is how we chlorinate the water” to “here are the hydrodynamic equations that your body must follow in order to swim fast”.</p>
<p>So I ditched traditional academia and I’ve been learning to code through completing projects. It’s been a self-directed study with motivation to actually do this for a living. It’s paid off - within a year, I am now a software engineer at Red Hat. Granted I still choke on water every once and a while, but I took to the water myself after studying what others do, how they code, and teaching others.</p>
<h2 id="learning-through-frustration">Learning through frustration</h2>
<p>I’ve certainly belly flopped a few times. I tend to find myself learning more through frustration than being simply told what or how to do it. It usually ends success but not without tears and wasted time. It may not be the best way to go about learning, but it certainly challenges me to push myself through the hard times; builds up my endurance for longer races.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-frustration.png" width="370" height="300")/> </p>
<p>But I see myself in a good position to teach what I’ve learned - I’m no olympic swimmer, I still have those fresh, n00b eyes when trying to understand a code base. Yet when I try to explain it to someone else, concepts do solidify – teaching others gives that personal meaning and understanding since I want to create that context that builds that cumulative learning.</p>
<h2 id="passing-on-the-frustration">Passing on the frustration</h2>
<p>In a selfish effort to further my understanding, I am here to coach you with 5 swim lessons: I’ve written five tutorials – well, three are complete, the other two have been coded out and need tutorial language behind it. These are meant to build on each other and thread that “concepts for context” for the new guppy in the water, but also be digestible and not overwhelming.</p>
<p>The tutorials aim to pick up where introductory or outdated tutorials leave off with fun projects. You learn how to tread water by playing water polo, not by someone dictating to move your arms and legs back and forth. </p>
<p>So each has a purpose and a set of goals, ends on how these projects are used in real life, and where to explore afterwards in case the new coder wants to learn more in depth.</p>
<p>There is some subtlety baked into these projects though. In particular, the language used and how topics are presented. I’ve written each tutorial with a set of side effects in mind – indirect learning – that eventually become the goals of future tutorials. So we play water polo for treading water, but it also builds endurance and strength indirectly.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-knowledge.png" width="370" height="300")/> </p>
<p>I want guppies to learn by doing. We can’t read how to swim. You have to feel the water against your hands to learn how to paddle through; you must accidently snort up water to learn how to breathe.</p>
<h3 id="defined-goals">defined goals</h3>
<p>As the new coder dives into the tutorial, s/he will be exposed to what Pythonic means through learning how to construct proper import statements in the proper order, how to write legible docstrings and comments, the language’s keywords when exploring file I/O.</p>
<p>S/he will work with third-party packages to get a soft intro to what it means to interact with a RESTful API, how to parse the data returned from that API with different data structures. Terminology of object-oriented programming are introduced when instantiating classes and methods versus calling a function.</p>
<p>Each tutorial has the same goal of developing one’s logic in approaching a problem through organization, reading others’ code, debugging, testing, and logging.</p>
<p>I’ve tried to write these in a way that addresses any “stupid” or “naive” questions by using sidebars or gently introducing new terminology. The difference between docstrings and comments may be obvious to some people, but not all – especially when just starting out.</p>
<h3 id="if-youre-curious">If you’re curious</h3>
<p>The constructive learning is exercised by using “if you’re curious” side-bars. </p>
<p>Of course you’re curious as a new coder! Just a bit of positive language to entice folks to read a bit more advance topics; to push their endurance.</p>
<p>As more advanced topics are introduced through these side-bars, they are not meant to pressure anyone reading through them - you can take a break to catch your breathe - but they are guided to find out more information if the new coder is wanting to dive deeper. If a s/he skips or doesn’t understand fully, it’s fine – the tutorial can still be completed with full understanding of the goals, and the “curious” concepts will be explored again in future tutorials.</p>
<h3 id="in-action">In Action</h3>
<p>Many times, we’re given a project in school, and we wonder, “What good is this? Is this even used in real life? am I ever going to compete in the state championships?” The application of a project is the pitfall for a lot of folks. It can be discouraging to learn how to swim without a realm to show off one’s skills.
I include an “in action” conclusion with how each project is used in the industry now. These are not meaningless exercises to learn data structures or how to make a graph with matplotlib. These tutorials use the learning of data structures <em>to</em> build tools that are being used in real life.</p>
<h3 id="where-to-go-from-here">Where to go from here</h3>
<p>Lastly, one main critique for new coders is “where do I go from here” after completing how-tos and guides. I made it to state champs, is that all?
It’s extremely difficult to learn what you need to learn, yet not know what <em>should</em> be learned. To not burn out our new flying fish, each tutorial ends with guidance on where to go to from the end, including how to build upon what was just coded out, and resources on the topics covered.</p>
<h2 id="the-tutorials">The tutorials</h2>
<p>So what exactly are the tutorials? What are my lesson plans?</p>
<h3 id="dataviz">DataViz</h3>
<p>The first tutorial is data visualization. While creating some graphs and plotting on Google maps, the purpose is for the new coder to understand how to:</p>
<ul>
<li>run a Python file from the command line</li>
<li>import a Python file</li>
<li>take a raw file and parse its data with Python’s data structures</li>
</ul>
<p>The side effects of working through the dataviz tutorial are:</p>
<ul>
<li>Importing Python’s standard library as well a self-written module</li>
<li>Installing and importing third party packages</li>
<li>Licensing & copyrights when using third-party packages</li>
<li>File Input/Output, Iterators, and Generators with using Python’s keywords and built-in functions</li>
<li>Global variables, docstrings, list comprehensions</li>
</ul>
<h3 id="api">API</h3>
<p>The second one uses the techniques from the dataviz tutorial to graph data grabbed from a public API. The project is to fetch video game platform information from <a href="http://www.giantbomb.com/api/">Giantbomb.com</a>, combine that with <a href="http://research.stlouisfed.org/fred2/data/CPIAUCSL.txt">CPI</a> data to adjust the value of the US dollar over time and generate a bar chart to show the price development.</p>
<p>The goals :</p>
<ul>
<li>Solidify how to build a simple graph in matplotlib and file I/O</li>
<li>Interact with a public API</li>
<li>Intro to REST</li>
<li>Parsing command line arguments</li>
</ul>
<p>What else folks will be exposed to:</p>
<ul>
<li>Python 2 versus Python 3’s print keyword/function</li>
<li>Logging</li>
<li>Validating data</li>
</ul>
<h3 id="web-scraping">Web scraping</h3>
<p>The web scraping tutorial is meant to show folks are able to grab data without the use of an API. The project builds a web scraper using <a href="http://scrapy.org/">Scrapy</a> to scroll through <a href="http://www.livingsocial.com">LivingSocial</a> and save local deals to Postgres. It also includes a quick how-to on cronjobs, so folks can run this script daily. So rather than getting those annoying emails, folks can query a database when they want a deal on sky diving or hot yoga.</p>
<p>The goals of this particular tutorial are to:</p>
<ul>
<li>Develop a more solid understanding of Python classes and inheritance</li>
<li>Python’s generators & iterators</li>
<li>Reading and writing to a database</li>
</ul>
<p>The subtle concepts folks will be exposed to are:</p>
<ul>
<li>Using an ORM</li>
<li>Import *</li>
<li>Making a portable application</li>
</ul>
<p>What’s great about this tutorial is that if folks have already gone through the Django tutorial, the models should be familiar, or would be a good primer for moving onto Django.</p>
<p>As a means of inspiration and to build a bit of personal context, the project finishes with a story of how one gal was able to continually scrape the London Olympics’ website in order to grab a ticket to the gymnastics final. </p>
<p>Since <a href="http://www.scrapy.org">Scrapy</a> makes use of Twisted, it creates that familiar ground when moving onto the next tutorial.</p>
<h3 id="irc-bot">IRC bot</h3>
<p>I just had to make a tutorial based off of <a href="http://twistedmatrix.com/trac/">Twisted</a>. The project is actually based off of Jessamyn Smith’s IRC bot – the <a href="https://github.com/jessamynsmith/talkbackbot">talkbackbot</a>, where if anyone would say “That’s what she said”, the bot would reply with a notable quote from a woman (that’s what she <em>really</em> said!).</p>
<p>The purpose for the new coders is to:</p>
<ul>
<li>Get an intro to “how the internet works”</li>
<li>Making a portable application</li>
<li>Logging</li>
</ul>
<p>while indirectly being exposed to:</p>
<ul>
<li>Event-driven programming</li>
<li>IRC protocol</li>
<li>Testing</li>
<li>The antiquated means of communication that a lot of engineers use</li>
</ul>
<p>Of course, software engineering is more than the internet. </p>
<h3 id="sudoku-game">Sudoku Game</h3>
<p>The final tutorial walks new coders through building a GUI with a Sudoku game. With the least amount of hand-holding, the tutorial walks through how to build a game board as well as how to approach programming the logic that makes Sudoku.</p>
<p>The goals for new coders to take away are:</p>
<ul>
<li>Understanding Python’s vast standard library</li>
<li>Drawing a custom GUI</li>
<li>Approach logic challenges</li>
<li>Testing</li>
</ul>
<p>while indirectly being exposed to:</p>
<ul>
<li>try & excepts</li>
<li>User-driven programming</li>
<li>“private” methods</li>
</ul>
<h2 id="newcoder.io">newcoder.io</h2>
<p>So because we’re in software engineering’s industrial revolution, we need to have a revolutionary way of learning how to code. </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/new-coder/SoS-lightbulb.png" width="370" height="300")/> </p>
<p>As the lifeguard and coach, I advise our Python guppies to not get discouraged. The way we are learning to code leaves us on the edge of the pool expecting us to compete after reading a chapter on human hydrodynamics. </p>
<p>Try out my swim lessons at <strong><a href="http://newcoder.io">newcoder.io</a></strong> - give feedback, contribute, spread it around, and use it to teach others.</p>
<p>So please - don’t drown. Swim safe!</p>
Soft release of my New Year's project: Salarlyhttp://www.roguelynn.com/words/soft-release-of-my-new-years-project/2013-01-26T08:41:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Finally, I can actually do some show & tell!</p>
<p>The US is trying to recover from the economic crises of the late decade. We’re still a bit high on unemployment at <a href="https://www.google.com/publicdata/explore?ds=z1ebjpgk2654c1_&met_y=unemployment_rate&idim=country:US&fdim_y=seasonality:S&dl=en&hl=en&q=us%20unemployment%20rate" title="Unemployment Rate via Google">7.8%</a>, and a bit too low with inflation below <a href="http://www.usinflationcalculator.com/inflation/current-inflation-rates/" title="Inflation Rate">2%</a>. People are <em>depressed</em> - job hunting, competing with hundreds of other candidates for <em>one</em> job opening, only to be ousted by the second cousin of the manager, right?</p>
<p>Thought you couldn’t be any more depressed? </p>
<p>I present to you <strong><a href="http://www.salar.ly" title="Salar.ly">Salarly</a></strong> - a way for you to browse all the internationally employed workers in the United States from 2011 - 2012.</p>
<p><a href="http://www.salar.ly/salaries?title=Financial+Engineer&company=&location=" alt="Financial Engineer H-1B salaries"><img class="displayed" src="http://www.roguelynn.com/assets/images/salarly/salarly.png" height="318px" width="552px" title="Financial Engineer H-1B salaries" alt="Financial Engineer H-1B salaries" /></a></p>
<p>Thought your salary was good? Take a look at what a <a href="http://www.salar.ly/salaries/?title=Financial+Analyst&company=&location=" title="Salarly: Financial Analyst">Financial Analyst</a> can make on an H-1B visa. Or maybe you’re not convinced of the <a href="http://www.salar.ly/salaries/?title=&company=&location=Mountain+View%2C+CA" title="Salarly: Mountain View">wealth</a> that makes Silicon Valley so popular. Thinking of moving somewhere? How about to where all the <a href="http://www.salar.ly/heatmaps/?title=engineer" title="Salarly: Heatmap of Engineers">engineers</a> are?</p>
<p>Don’t take my word for it, check it out <a href="http://www.salar.ly" title="Salar.ly">yourself</a>. This is all public data from the <a href="http://www.foreignlaborcert.doleta.gov/quarterlydata.cfm" title="US DoL">US Department of Labor</a>, lightly cleaned up for awful administrative typos, rendered using <a href="http://d3js.org/" title="d3.js">d3.js</a> and <a href="http://tenxer.github.com/xcharts/" title="xCharts">xCharts</a>, some async calls with jQuery, with Django under the hood, hosted on Heroku. Mind you, we haven’t figured out appropriate memory management - I’m certain it will crash/timeout after 5 people make some queries. We’re also not designers - don’t judge.</p>
Avoid Drowning: Swim your way through a new code basehttp://www.roguelynn.com/words/avoid-drowning/2013-01-10T08:43:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>A month or two ago, I signed up for a lightning talk for <a href="http://www.meetup.com/Women-Who-Code-SF/events/93965402/" title="WWC Lightning Talk Event">today’s</a> Women Who Code event. I didn’t submit a title - I was more, “sure, I could talk!”</p>
<p>Only on Tuesday did I find out I was actually on the bill to speak! Welp, here’s a just-in-time presentation. My slides are <a href="http://www.slideshare.net/roguelynn/avoid-drowning">here</a> - but basically this post flows through what’s presented on.</p>
<h2 id="my-experience">My experience</h2>
<hr>
<p>Coming back from Brno, and having just started a new engineering position, I feel like I knew this topic well: how to swim your way through a new code base. There may be other/better advice, but this was my take away from my first three months at Red Hat.</p>
<h2 id="n-diving">N<img src="http://www.roguelynn.com/assets/images/avoid-drowning/no_diving_small.png" alt="no_diving"> Diving!</h2>
<hr>
<p>You might feel inclined to jump into the code base. FFS you’re a coder! Why wouldn’t you want to jump in?</p>
<p>As most lifeguards might say of pools with shallow depths: you’ll break your neck and drown.</p>
<p>Let’s take a more holistic approach - I’ll walk you through how I am fumbling through getting to know two (!) code bases with questions I asked myself. Mind you, not all these questions are relevant to your situation; it all depends on your code base, naturally.</p>
<h2 id="t-p-down-approach">T<img src="http://www.roguelynn.com/assets/images/avoid-drowning/droplets_small.png" alt="droplet"> p-down approach</h2>
<hr>
<h3 id="overview">Overview</h3>
<ul>
<li>What is the purpose of this project or product?</li>
<li>What problems does it solve for its target audience?</li>
<li>What are its overall strengths and weaknesses; its selling points?</li>
<li>What competitive projects or products are out there?</li>
</ul>
<h3 id="architecture">Architecture</h3>
<ul>
<li>What major components make up the code base?</li>
<li>How do they all fit together?</li>
<li>How do they communicate with each other?</li>
</ul>
<h3 id="break-it-apart">Break it apart</h3>
<ul>
<li>What’s this component or chunk’s purpose?</li>
<li>What problem does it solve for this project?</li>
<li>Strength & weaknesses of this chunk or component?</li>
<li>How does it fit with another chunk(s)?</li>
</ul>
<h2 id="lather-rinse-repet.">Lather, rinse, repe<img src="http://www.roguelynn.com/assets/images/avoid-drowning/repeat_small.png" alt="repeat">t.</h2>
<hr>
<p>Continue on until you drill down to the very lowest component, the lowest before you hit the code to answer your questions.</p>
<p>You should be reviewing documentation and architecture diagrams & walk-throughs, talking to the architect of the product/project, and your team mates.</p>
<h2 id="go-with-the-flow-">Go with the <img src="http://www.roguelynn.com/assets/images/avoid-drowning/fish_small.png" alt="flow"> flow <img src="http://www.roguelynn.com/assets/images/avoid-drowning/fish_small.png" alt="flow"></h2>
<hr>
<p>Now it’s time to figure out the flow of the project or product.</p>
<h3 id="user">User</h3>
<ul>
<li>Who is the target audience? Other developers, IT professionals, non-tech folks?</li>
<li>How is this project used?</li>
<li>What is the learning curve for the user?</li>
</ul>
<h3 id="sys-admin">Sys Admin</h3>
<ul>
<li>How are users set up?</li>
<li>How is it maintained, updated, upgraded, supported, etc?</li>
<li>How does it work with the existing systems?</li>
</ul>
<h3 id="your-manager">Your Manager</h3>
<ul>
<li>What are the goals of the project? Future feature implementations? Direction that this product is going?</li>
<li>What are the release cycles or pressure dates?</li>
<li>What other teams do you need to work with? QE? Support? Other complementary/necessary projects/products?</li>
</ul>
<h3 id="developer-you">Developer (YOU)</h3>
<ul>
<li>How do you submit and fix a bug?</li>
<li>How is the code tested?</li>
<li>Who are the go-to people for certain aspects of the project?</li>
</ul>
<h2 id="ease-in-the-water39s-a-bit-cold">Ease in <img src="http://www.roguelynn.com/assets/images/avoid-drowning/ease_in_small.png" alt="ease"> the water’s a bit cold</h2>
<hr>
<p>Alright - git pull that code.</p>
<h3 id="file-hierarchy">File Hierarchy</h3>
<ul>
<li>Top-down approach again: how is this project organized? What files/modules depend on what?</li>
<li>How does the file structure match up with the architecture you saw earlier?</li>
<li>Where is the documentation (both for developer and user/admin)? The source code/moving parts? The test suite? (Who tests the tests!?)</li>
<li>What modules are used? Look what’s being used from the language’s standard library, the modules defined in the package itself, and third-party packages.</li>
</ul>
<h3 id="dependencies">Dependencies</h3>
<ul>
<li>What are the operating system requirements? Hardware reqs?</li>
<li>Software requirements: is there an assumption that users have the default database needed? Or other libraries already on your machine?</li>
<li>What are the build requirements for the project?</li>
</ul>
<h3 id="swim-floaties-remember-those-things-around-your-arms">Swim floaties (remember those things around your arms?!)</h3>
<ul>
<li>‘<code>$ git log</code>’ for commit logs (watch out - it could go back years). Check out git’s pretty print documentation for more readable output.</li>
<li>‘<code>$ git blame</code>’ is, of course, a great tool. Include -L flag for to/from line numbers, and -e flag to see emails of the contributors.</li>
<li><a href="http://dev.hubspot.com/blog/bid/57694/Git-by-a-Bus" title="Git By a bus">git-by-a-bus</a> is a great tool goes through the logs of the code and gives you html to visually see who are the biggest contributors to which parts of the library. Highly recommend.</li>
</ul>
<h3 id="challenge-yourself">Challenge yourself</h3>
<ul>
<li>Write up documentation that’s missing.</li>
<li>Fix a bug & submit a patch. This really forces you to understand the code, how it was written, conventions of the project, and how small pieces fit together.</li>
<li>Write more tests to increase test coverage.</li>
</ul>
<h2 id="lifesvers">Lifes<img src="http://www.roguelynn.com/assets/images/avoid-drowning/life_preserver_small.png" alt="life">vers</h2>
<hr>
<h3 id="mentors">Mentors</h3>
<ul>
<li>Find an internal mentor to bug about package/product questions. It may be your lead, or someone who’s been there for a while.</li>
<li>An external mentor is great to have too - perhaps you have a language question and you don’t want to look stupid in front of your coworkers. It’s also great to get a different point of view for how the development process works elsewhere.</li>
<li>Your manager - yep, s/he is by default a mentor, whether good or bad, or absent. Learn how to manage up if you’re not getting what you need at first. Avoid detailed questions and ask bigger questions like “how is my approach in this?”, or “am I learning at an acceptable speed?”</li>
</ul>
<h3 id="research">Research</h3>
<ul>
<li>I highly recommend keeping a personal/private wiki as you learn new terms, processes, etc.</li>
<li>Also - a bookmarking service that helps you organize your research for quick retrieval (I recommend <a href="http://pinboard.in" title="Pinboard">Pinboard</a>).</li>
<li>Old fashioned post-its, whiteboarding, paper+pen, anything. Physically writing down a piece of information (typically architecture diagrams) helped me solidify concepts better.</li>
</ul>
<h3 id="suggested-reading">Suggested Reading</h3>
<ul>
<li><a href="http://pragprog.com/book/tpp/the-pragmatic-programmer" title="The Pragmatic Programmer">The Pragmatic Programmer</a></li>
<li>A language-specific cookbook (keep near your work computer)</li>
<li><a href="http://pragprog.com/book/jcdeg/new-programmer-s-survival-manual" title="New Programmer's Survival Manual">The New Programmer’s Survival Manual</a></li>
<li><a href="http://pragprog.com/book/kcdc/the-developer-s-code" title="Developer's Code">The Developer’s Code: What Real Programmers Do</a></li>
</ul>
<h2 id="swim-safely-">Swim safely! <img src="http://www.roguelynn.com/assets/images/avoid-drowning/scuba_small.png" alt="scuba"></h2>
<p><br /></p>
My New Years Themehttp://www.roguelynn.com/words/my-new-years-theme/2012-12-31T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I don’t do New Years' resolutions. But I like to take the time and reflect on the past year to declare a theme for next year. </p>
<h3 id="s-theme-be-awesome.">2012’s theme: Be awesome.</h3>
<p>I think I accomplished that!</p>
<ol>
<li><em>January - March</em>: led a Python study group for <a href="http://meetup.com/women-who-code-sf">Women Who Code</a>.</li>
<li><em>March</em>: Attended PyCon in Santa Clara</li>
<li><em>March</em>: Immediately submitted a talk to OSCON after being inspired at PyCon. As well as DjangoCon EU and EuroPython.</li>
<li><em>April</em>: founded the SF chapter of <a href="http://meetup.com/pyladiessf">PyLadies</a> .</li>
<li><em>May</em>: made the decision to pursue programming as a career and leave finance behind.</li>
<li><em>June</em>: attended my second conference and gave my first <a href="http://klewel.com/conferences/djangocon-2012/index.php?talkID=43">talk</a> at DjangoCon EU.</li>
<li><em>July</em>: <a href="http://www.youtube.com/watch?v=l2PnVKQJg0I">keynoted</a> my third conference! EuroPython!</li>
<li><em>July</em>: attended OSCON and gave my third talk. Also brought PyLadies to the expo hall of the conference.</li>
<li><em>September</em>: <a href="http://www.roguelynn.com/words/from-n00b-to-engineer-in-one-year">Accepted an offer</a> from Red Hat as a Software Engineer!</li>
<li><em>October</em>: Started my first engineering job!</li>
<li><em>October</em>: My second keynote! PyCarolinas!</li>
<li><em>November</em>: Gave a talk at RuPy - my first not-just-Python conference. Also gave a free Django/Python workshop for women.</li>
<li><em>October - December</em>: <a href="http://www.roguelynn.com/words/reflection-of-my-time-in-brno">Mini-bootcamp</a> at Red Hat Czech in Brno.</li>
<li><em>December</em>: <a href="http://www.roguelynn.com/words/crap-im-speaking">Made it to #4</a> of Hacker News with my new static blog.</li>
<li><em>December</em>: Went to Zagreb, Croatia to <a href="https://www.facebook.com/photo.php?fbid=450870981637921&set=t.14900117&type=3&theater">host</a> a Django + Python workshop!</li>
</ol>
<p><br />
The events I held and/or led for PyLadies were a lot of fun:</p>
<ul>
<li>Kickoff hackathon</li>
<li>git workshop x 2</li>
<li>Django workshop</li>
<li>Udacity & Coursera study groups</li>
<li>Mini-DjangoCon</li>
<li>Mini-PyCon</li>
<li>Moar hackathons!</li>
<li>Contributing to OSS</li>
<li>SF Django + PyLadies sprint on Django</li>
<li>Writing PyCon proposals</li>
</ul>
<p>As well as organizing the DjangoCon US scholarships for PyLadies and for the DSF and helping spread the PyLadies chapters in Atlanta, NYC, Seattle, PDX, and elsewhere. A new PyLadies website that I started should be up soon too that will include a PyLady Events Code of Conduct. </p>
<h3 id="what-is-the-theme-for-2013">What is the theme for 2013?</h3>
<p>I had an epic year: met so many fantastic people in the Python world and in the San Francisco community including Python + Django core devs and community leaders; got the guts to give talks (and actually enjoy giving them!); got a job that I freaking love. </p>
<p>So, what will the theme be for 2013? </p>
<p>Be <strong>awesome-_r_.</strong></p>
<p>Already lined up is a bunch of talks & conferences: PyCon, DjangoCon EU, EuroPython, and PyCon Ireland.</p>
<p>I have this awesome task of integrating FreeIPA -> OpenShift that at which I <em>will</em> freaking succeed.</p>
<p>Something that I’d like to do more of is commit code to OSS projects. Currently that is the premise of my job, but I’d like to contribute to projects outside of those that pay me.</p>
<p>2012 paved a great road for me to continue down. How could I not continue to be awesome?!</p>
Reflection of my time in Brnohttp://www.roguelynn.com/words/reflection-of-my-time-in-brno/2012-12-21T09:48:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I’m just returning to San Francisco from spending two months at <a href="https://maps.google.com/maps?q=red+hat+czech&ll=49.226623,16.581266&spn=0.007875,0.017424&hq=red+hat+czech&t=m&z=16&iwloc=A" title="Red Hat Czech">Red Hat Czech</a> in Brno, Czech Republic. It had its ups and downs, but most of all, I found it important and pivotal to my career as I develop into an engineer.</p>
<p>But I need to be honest: throughout 7 of the 8 weeks I spent in Brno, I felt completely and utterly lost. I struggle with the lack of guidance given, especially being brand new into engineering as a profession. </p>
<p>My job for the next year is to integrate freeIPA into <a href="http://openshift.rhc.com" title="OpenShift">OpenShift</a>, followed by a few more prominent OSS projects. During my one-on-one phone calls with my manager, he says, literally, “You need to develop a deep understanding of the freeIPA project,” and “You need to figure out what you don’t know, and learn it,” without any further guidance other than that I need to accomplish by end of February: to have a proof of concept of integration of IPA -> OpenShift.</p>
<p>In other words: </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/reflection-brno/no_idea_dog.png" title="I have no idea what I'm doing dog meme" alt="I have no idea what I'm doing dog meme"/></p>
<h3 id="from-email-setup-to-patch-submission-in-8-days-then-what">From email setup to patch submission in 8 days, then what?</h3>
<p>I previously <a href="http://www.roguelynn.com/words/from-email-setup-to-patch-submission-in-8-days" title="From email setup to patch submission in 8 days">wrote</a> about the progress I was making. TL;DR - I submitted a patch in the first two weeks of working (mind you, it was immediately NACKed and a lesson was learned in testing). But not much followed other than a handful of other patches. </p>
<p>Sure, I learned bits of the IPA package in detail in regards to code and moving parts, and on the grand scale of how LDAP, DNS, AD trusts, Kerberos, Daemon, NTP, Certificate Authority, etc, all talk to each other and <em>why</em> it makes freeIPA appealing to sys admins, developers, etc. But it was (and still is) excruciatingly difficult to develop an understanding of what parts need to be integrated, and when I need to take step back and learn the deep, dark corners of DNS, BIND, and Apache. </p>
<p>Thank god for mentors outside of my team as well; stupid questions about the Python language, like “Isn’t it bad to have import *? Why?” were met with impatience at work. That’s not to say my team mates weren’t helpful; I had many ELI5 moments with them.</p>
<h3 id="my-expectations">My Expectations</h3>
<p>I had this idea in my head that I would write more than ~15-20 lines of code for the IPA project. Correction: I assumed I’d write more <em>Python</em> code during this time here in Brno. I wanted to learn how to write good code, the design pattern of the project, maybe write a test or two.</p>
<h3 id="what-i-learned">What I learned</h3>
<p>What I actually ended up learning was the overall Linux file system (naturally), Vim, git, the process of submitting patches for review, public versus private errors, virtual machines (and the many ways to break them), and git blame, <a href="http://andyjeffries.co.uk/articles/25-tips-for-intermediate-git-users" title="Tips for intermediate git users">git reset HEAD@{1 hour ago}</a>, and <a href="https://github.com/tomheon/git_by_a_bus" title="Git by a Bus">git-by-a-bus</a>. </p>
<p>Other things I’m sure some take for granted like iptables, key-bindings & keyboard shortcuts (using those make me feel like a baller), vimrc, simple bash commands & scripting, and the like.</p>
<p>And yes, I also learned a bit of Ruby. To be honest, I’ve written more Ruby code for OpenShift than Python code for freeIPA. I’ve started to make a list of grievances of the Ruby language when coming from Python. #snark</p>
<p>I also learned the structure of OpenShift - the two Apache servers it uses, the broken installation scripts, the larger picture of what of freeIPA needs to be integrated where.</p>
<h3 id="what-i-need-to-learn">What I need to learn</h3>
<p>I still have a ways to go in learning to program in Ruby, but it’s understandably easier when I have an okay grasp on programming in Python.</p>
<p>Apache is still a black hole to me, but I’ve begun to understand what pieces need to be added to OpenShift.</p>
<p>I also have no freaking clue how <em>exactly</em> to integrate a project that’s written in Python into a Ruby-based project.</p>
<p>I also need to develop a better understanding of GSSAPI (supposedly an implementation of an API, written in Python, Ruby, whatever, that appropriately talks to Kerberos, written in C). This understanding will needed for when the DNS dynamically updates hosts (through nsupdate) and there needs to be a Kerberos-based authentication handshake.</p>
<h3 id="what-troubles-me">What troubles me</h3>
<p>This task I am meant to do is complex; it’s not an application where I can go bug my dev friends for the best library or approach. These are two different projects where the integration happens at a level that folks have experience in either but not both. <em>I</em> am the integration point; <em>I</em> am supposed to have an understanding of these two systems, how they differ and compare in setup. This requires a deep understanding of how the internet works, of Linux systems, and of security implementations, among other things that I can’t even conceive of right now.</p>
<p>What scares the sh!t out of me is that there is no one that knows how to do this better than me. #wat </p>
<p>The fear of failure has never been stronger, but it is a great motivator.</p>
CR@P I'M SPEAKING AT [insert conference here]! How do I prepare?http://www.roguelynn.com/words/crap-im-speaking/2012-12-03T09:48:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>You just got a notice saying that your talk was accepted to [some huge freaking conference]. Awesome!</p>
<p>Oh but wait, now you have to actually <em>talk</em>. That entails preparation, and speaking in front of people!</p>
<h3 id="step-1-let-it-set-in.">Step 1: Let it set in.</h3>
<p>Breathe. Maybe pour a glass of wine.</p>
<h3 id="step-2-find-inspiration.">Step 2: Find inspiration.</h3>
<p>A little research on how to speak well can go a long way.</p>
<h4 id="reading">Reading</h4>
<p>I can suggest <a href="http://www.amazon.com/Confessions-Public-Speaker-English/dp/1449301959" title="Confessions of a Public Speaker">Confessions of a Public Speaker</a>. The book is easily digestible and pretty humorous. The author offers great practical advice, and while it is more of a how-to book, it reads like a novel.</p>
<p>Another book I recommend is <a href="http://www.amazon.com/Presentation-Zen-Simple-Design-Delivery/dp/0321525655" title="Presentation Zen">Presentation Zen</a>. Please, if you only get one thing out of this book, it’s is that simplicity matters, but black font on a white background sucks. The main point, though, is that the slides should support you, the speaker, and not detract. It offers great advice on design tips, planning, and delivery.</p>
<h4 id="watching">Watching</h4>
<p>There are some fantastic folks that give talks. One of my absolute favorites is a TED talk from Benjamin Zander on <a href="http://www.ted.com/talks/benjamin_zander_on_music_and_passion.html" title="Benjamin Zander TED Talk">The transformative power of classical music</a>. This dude is so epic, so wrapped up in his own passion that he immediately connects with his audience. He talks about the disconnect between classical music and mainstream society today. But really, he could talk about anything and I’d still watch him.</p>
<p>Another one of my favorite TED talks is by Elizabeth Gilbert, <a href="http://www.youtube.com/watch?v=86x-u-tz0MA&feature=youtu.be" title="Elizabeth Gilbert TED Talk">Your elusive creative genius</a>. She talks about everyone having a genius within. But what is remarkable is how much effort she put into practicing this talk.</p>
<p>A technical talk that I thoroughly enjoyed was Jessica McKellar’s <a href="http://klewel.com/conferences/djangocon-2012/index.php?talkID=35" title="Jessica McKellar Keynote at DjangoCon EU 2012">Keynote</a> at DjangoCon EU in 2012. She was critical in a positive way, and very engaging with the audience because of her spot-on observations about Django’s weaknesses.</p>
<h3 id="step-3-write-out-your-talk.">Step 3: Write out your talk.</h3>
<p>Sit down and write out your talk. Perhaps when submitting your proposal, you gave an outline. Flesh that out a bit with solid speaking points. What I typically do is type it out on a text editor, a sentence or two per point.</p>
<p>One solid piece of advice is to follow this flow:</p>
<ol>
<li>Who am I?</li>
<li>Why am I here?</li>
<li>Why do you care?</li>
</ol>
<p>Beginning your talk stating who you are, why you are speaking, and what the audience’s take-away will allow the audience to have a good idea of what to expect.</p>
<h3 id="step-4-practice.">Step 4: Practice.</h3>
<p>Practice. </p>
<p>Yep, you knew that was coming. Craig Kerstiens wrote up a great <a href="http://craigkerstiens.com/2012/06/19/pro-tips-for-conference-talks/" title="Craig Kerstiens' Pro-tips for Conference Talks">post</a> containing tips for preparation.</p>
<p>Essentially, practice at home in front of friends (perhaps entice them with a free meal, first) and/or practice locally at meetups or events like <a href="http://igniteshow.com/" title="Ignite">Ignite</a> or <a href="http://www.pecha-kucha.org/" title="Pecha Kucha">Pecha Kucha</a>.</p>
<p>This is vital so you can get feedback. Perhaps your message wasn’t clear, or your font on your slides are too similar to comic sans. Maybe you fumble and don’t notice it with too many “um"s and "whatnot"s.</p>
<h3 id="step-5-relax.">Step 5: Relax.</h3>
<p>As a former collegiate athlete, it was known that ‘the night before the night before’ was the most important. For instance, if your talk is on Saturday, Thursday night’s sleep will affect your energy level and stamina (and thus, affecting nerves and confidence) more than Friday. That’s not to say rest on Thursday and party on Friday. </p>
<p>Another tidbit from my athlete days is to ease up on yourself the day before. Relax, tune out, don’t stress. Assuming you have practiced your talk beforehand and are comfortable, then ease up the day before. No need to stress about getting one more practice run in.</p>
<p>However, I feel like this portion of advice will fall on deaf ears seeing that a lot of folks are procrastinators like myself. </p>
<p>Good luck, and knock ‘em dead!</p>
From email setup to patch submission in 8 days.http://www.roguelynn.com/words/from-email-setup-to-patch-submission-in-8-days/2012-11-06T09:48:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Hellz yea, I’m a freaking engineer (sounds more hilarious in my head then it reads). Now who the hell wouldn’t want to be an engineer? Fellow PyLady, Julia Grace, <a href="https://twitter.com/jewelia/status/262665853483499520" title="Tweet from Julia Grace">asked</a> about my expectations of being an engineer versus what I actually experienced. </p>
<h4 id="tldr-it39s-awesome.">TL;DR: It’s awesome.</h4>
<p>Here’s the basic run down: </p>
<h4 id="asked-advice-from-systers-amp-devchix-mailing-list">Asked advice from Systers & DevChix Mailing List</h4>
<p>(exact quotes, but leaving anonymous):</p>
<ul>
<li> Curiosity will keep you from becoming one of the ‘not my job’ people.</li>
<li> Remember that a lot of engineers don’t have the best people skills so some ‘rude’ people are that way unintentionally.</li>
<li> If they dont answer questions, it is most likely because they dont know the answers.</li>
<li> Build relationships if possible. Never miss a happy hour type of thing.</li>
<li> “Where did you learn that?” is an interesting question</li>
<li> Figure out the ways in which you like to work- and others like to work- first.</li>
<li> Many guys truly don’t think that women can or should be coding as equals.</li>
<li> At least 89% of the time, by the time I have really defined my question, I have figured out the answer.</li>
</ul>
<h4 id="expectations-and-preformed-thoughts-amp-concerns">Expectations and Preformed Thoughts & Concerns:</h4>
<ul>
<li> I would have to wind down my community engagements to an ‘acceptable’ level, e.g. only x amount of non-Red Hat related conferences.</li>
<li> Thrown into code to fix a bug/patch/whatever</li>
<li> Spend wayyyy too much time thinking over a trivial aspect of said bug/patch/whatever.</li>
<li> Many naive questions from me, met with a poor/mean/unadjusted attitude, or worse, lack of any sort of emotional response (I’d rather -know- I annoy you than wonder if you loathe my questions).</li>
<li> I’d be restricted from doing certain things; “when you’re ready for it” aka “when I’m ready to give this piece of my job up and move on to something better.”</li>
</ul>
<h4 id="i-set-expectations-formyself-too">I set expectations for *myself* too:</h4>
<ul>
<li> Learn this shit fast.</li>
<li> Be awesome.</li>
<li> Make a ‘mark’ of myself (in terms of fixing bugs, finding errors in code or optimizing, helping users of the project, etc).</li>
<li> Learn faster than what is thrown at me.</li>
<li> Submit a patch by the end of the 3rd week (end of the 1st full week in Brno).</li>
<li> Maintain awesomeness.</li>
</ul>
<h4 id="how-the-first-few-weeks-actually-went">How the first few weeks actually went:</h4>
<ul>
<li> Crash course on LDAP & Kerberos, how to create VMs both on my local machine and on remote servers (I feel dangerous now).</li>
<li><p>My last name <em>is</em> awesome. However, sometimes I misread my terminal prompt:</p>
<p><code>[lroot @ remote-server] $ vim /accessible/by/root.conf
Must be root to setup this server!
/me what? I thought I...oh damn</code></p></li>
<li><p>Learning Linux machines like woah.</p></li>
<li><p>Sometimes just restarting does work.</p></li>
<li><p>Have achieved new levels of git-fu. (git rebase, squash & reflog are a n00b’s best tools)</p></li>
<li><p>Realized Macs are only popular for their GUIs and Aluminum casing (not giving mine up anytime soon, though)</p></li>
<li><p>These folks have a lot of patience for someone just learning like me. Very comforting.</p></li>
<li><p>People don’t need free meals/fish deliveries/spousal salaries upon death to be happy to work somewhere.</p></li>
<li><p>It is SO freaking nice not to worry about being in the office during the same time as your manager (precious ‘face time’); clocking every minute of every hour that has any relation to work.</p></li>
<li><p>Engineers are so relaxed and fun to work with. I feel no pressure, and therefore no nerves when asking questions. Little things like, ‘here are the IRC channels we’re in…’ and ‘Thursday’s the best because it’s breakfast day.’</p></li>
<li><p>My manager flat out said “I won’t read it” when I forwarded him <a href="http://www.roguelynn.com/2012-10-21-community-ftw-kicking-of-the-pycarolinas-community" title="Community FTW">my post</a> about the community talk I gave at PyCarolinas (among other questions & discussion points). Damn that guy is awesome. No nonsense, no bs, no coddling, very helpful, thinks outside the box, and brilliant.</p></li>
<li><p>My manager also flat out said “That’s why we hired you” when I asked him about being invited to speak at conferences. hellz yea.</p></li>
<li><p>I can surprise a few people with the little Czech knowledge I’ve retained. *“jedno velky pivo, prosim.”*</p></li>
<li><p>I was assigned 5 tickets on day #2 in Brno. This will probably be the only time I will be excited to be assigned tickets. They seem to be the types of tickets that are ‘easy pickens’ but also give the challenge of making you dig through the whole freaking code base only to find the issue comes from an outside package. /phew </p></li>
<li><p><a href="http://www.redhat.com/archives/freeipa-devel/2012-October/msg00556.html">I SUBMITTED MY FIRST PATCH</a> (day #3 in Brno. 8 days of learning terms/code/git/processes, ignoring travel days, orientation, blah). <em>Side note: that patch broke everything. I learned a lesson in tests the following day.</em></p></li>
</ul>
<h4 id="cultural-differences">Cultural Differences:</h4>
<p>I was also <a href="https://twitter.com/juliaelman/status/262666318715707392%20" title="Julia Elman's tweet">asked</a> about what differences I’ve seen in the engineering cultures, and the <a href="https://twitter.com/aesptux/status/262668691731263488" title="Adrian Espinosa's tweet">moving abroad experience</a> in general. Immediately, my memory is refreshed on how hyper-sexualized women are dressed here. I wear converse or pumas everyday -> flag for ‘I’m not from here!’ as many women where heals (not crazy tall ones, although some do). And hygiene can be an issue for some folks (lulz). Also, while it’s still relatively cheap in Czech Rep (2006: 22-24 Kc to 1 USD, now: 20 Kc to 1 USD), cost of living has gone up. Food is more expensive, housing, Ikea.</p>
<p>Immediate engineering culture differences are hard to pick up right now, and I hope to be able to have a better PoV after 2 months. Every so often, I get a surprised look when I ask a question, giving me the impression that I’m not as much of a n00b as was thought. I think that more comes from the reputation that preceded me, the discussions that were had before I arrived. I don’t read into this at all. I mean - no one told me to read into the package code before arriving… but I did, and it’s
helped a lot. I haven’t been approached inappropriately, scoffed at, or met with impatience.</p>
<p>All in all: this <em>job</em> lifestyle is utterly fantastic. I got hired because I like to speak AND code AND continually learn. And I have expectations for myself to continue to speak AND code AND learn. Look at that? all aligned! Just one drawback: I dream in test failures now.</p>
NetSec for n00bs, part III: Simple Intro Public Key Cryptohttp://www.roguelynn.com/words/netsec-for-n00bs-part-iii-simple-intro-public-key-crypto/2012-10-15T09:48:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>This is part 3 in a short intro series for netsec. <a href="http://www.roguelynn.com/words/2012-10-01-netsec-for-n00bs-part-i-password-storage" title="Password Storage">Part I</a>, <a href="http://www.roguelynn.com/words/2012-10-03-netsec-for-n00bs-part-ii-ciphers-symmetric" title="Symmetric Ciphers">Part II</a>. <a href="http://www.reddit.com/r/AskReddit/comments/1198on/i_fucking_love_riddles_what_are_your_best_hardest/c6kfncq" title="AskReddit: Riddles">Riddle time</a>:</p>
<blockquote>
<p>Q: You live in a country with a corrupt mail system. They open every package if they can. You put a lock on the box. You can’t mail a key. How does the receiving person open the box?
A: Send the box with the lock to this person. He can’t open it, but he can put another lock on this box. This person sends this box with the 2 locks back to you, you unlock your lock and send it back again. So there is just his lock on the box and he can finally open it…</p>
</blockquote>
<p>This is how public/private key crypto, aka asymmetric encryption (also: remember this riddle for technical interviews). Essentially you have two keys. One is private, which you can imagine, only your computer knows and has access to. The other is public, which you can give out to any sort of service that wants to talk to your computer securely. Let’s use GitHub for example - you are working on some code, commit locally, then finally make the decision (after you squash all your messy commits) to push to the remote repo on GitHub. In order to do so, you need to give GitHub your public key. How do you get your public key? In your terminal:</p>
<p><code>
$ ssh-keygen -t rsa -C "<a href="mailto:your_email@youremail.com">your_email@youremail.com</a>"
# generates a public key, saves it in ~/.ssh
# you will be asked for a passphrase to associate with the public key
$ pbcopy < ~/.ssh/id_rsa.pub
# creates a copy of the public key to your clipboard
</code></p>
<p>Then you pasted the key into your GitHub account (more solid directions <a href="https://help.github.com/articles/generating-ssh-keys" title="GitHub SSH key gen">here</a>). Now GitHub has your public key, and can communicate with your computer when you want to push code to a GitHub-hosted remote repo. When you push/pull code from the remote repo, GitHub sends a scrambled message that only your private key can decode, and vice versa, your computer sends a message encoded by your private key that can only be decoded by your public key. </p>
<p>If everything checks out, then the ‘transaction’ happens - push/pull code. The algorithm to compute public/private keys is based on prime numbers and the fact that it is difficult to deduce a private key from its associated public key. It’s simple to figure out the product of two prime numbers, <a href="http://www.see.ed.ac.uk/it/online/memos/pkey.html" title="Intro to public key encryption">but it is much more difficult to factor a number</a> (e.g. 101 * 113 = x is far more easier to figure out than x * y = 11413). <a href="http://searchsecurity.techtarget.com/definition/PKI" title="What is PKI?">To summarize</a>:</p>
<ul>
<li>To send an encrypted message, use the receiver’s public key</li>
<li>To send an encrypted signature, use the sender’s private key</li>
<li>To decrypt an encrypted message, use the receiver’s private key</li>
<li>To decrypt an encrypted signature (and authenticate the sender), use the sender’s public key</li>
</ul>
NetSec for n00bs, part II: Ciphers (symmetric)http://www.roguelynn.com/words/netsec-for-n00bs-part-ii-ciphers-symmetric/2012-10-03T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>A continuation of <a href="http://www.roguelynn.com/words/2012-10-01-netsec-for-n00bs-part-i-password-storage" title="Netsec for n00bs Part I">NetSec for n00bs, part I: Password Storage</a> A cipher is basically an algorithm to perform encryption and decryption. </p>
<p>Important to note: cipher algorithms, as opposed to hash algorithms, go both ways, meaning if you process a plain text to encrypted text using a cipher, you use that same cipher to decrypt the encrypted text back to plain text, aka symmetric key encryption. </p>
<p>A very simple, widely known example (and not at all secure) is called a <a href="http://en.wikipedia.org/wiki/Caesar_cipher" title="Wiki: Caesar Cipher">Caesar Cipher</a>. Let’s say you are in primary school, and you want to write a note to your neighbor saying ‘I think Jon is cute.’ But god forbid that either Jonor your teacher intercepts such a note! Instead you are clever; for each letter in that phrase, you shift it in the alphabet by 3.</p>
<pre>
Plain : ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher: DEFGHIJKLMNOPQRSTUVWXYZABC
</pre>
<p>Your message would read as: “l wklqn mrq lv fxwh.” This would be pretty easy to break, no? All you’d have to do is try each shift 25 times. Or you can look at the ciphered text and see ‘l’ is used three times, once alone at the start of the phrase, once in the middle of the second word, and again in the fourth. </p>
<p>In the English language, A & I are the only letters that can stand by themselves. Seeing as how we intercepted a primary school girl’s note, one can assume that the message would have herself as the subject of the phrase. Assuming l = i, you have figured out the shift of the Caesar cipher, and can easily figure out the rest of the message. </p>
<p>Adding some complexity to the Caesar cipher is the <a href="http://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher" title="wiki Vigenere cipher">Vigenère cipher</a>. This one is very similar, but uses multiple Caesar ciphers based on a keyword. To expound on our example, let’s say our keyword will be ‘playground.’ We’d then use the (not so) <a href="http://en.wikipedia.org/w/index.php?title=File:Vigen%C3%A8re_square_shading.svg&page=1" title="Vigenere square">super secret decoder ring</a> that Vigenere developed to encode with the keyword.</p>
<pre>
PLAINTEXT: ITHINKTHATJONISCUTE
KEY WORD : PLAYGROUNDPLAYGROUN
ENCRYPTED: XEHGTBHBNWYZNGYTINR
</pre>
<p>The letter ‘i’ is paired up with ‘p’, so you’d use column ‘i’ and row ‘p’ in the super secret decoder ring, aka the Vigenere square, to get the ciphered letter. This way, you’d only have to keep the keyword secret. In symmetric key encryption, the drawback is that both parties, the sender & the (intended) receiver of the message, need to know what the key is, as opposed to public-key encryption (will be part III of this series). </p>
<p>Currently, the <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard" title="Wiki: AES">Advanced Encryption Standard</a> (AES) is a well-respected specification for symmetric-key encryption, and became federal standard in the early 2000s and deemed suitable for ‘secret’ and ‘top secret’ information (with varying degrees of key length needed depending on which level of confidentiality is needed). While the algorithm is itself difficult to break (an attack on the algorithm itself <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard#Known_attacks" title="Wiki: AES known attacks">happened around 2009</a> on certain versions of AES), side-channeling is effectively attacking an implementation of AES with a longer known history of attacks. </p>
<p>Side-channeling is quite cool and devious, if I may say. Attackers use information that the implemented AES system gives off: timing info, power consumption, etc. If using a timing attack, one would ‘listen’ for how long it takes the hardware that holds the algorithm to compute cryptographic operations. Using statistical analysis, one can sometimes figure out the whole key. Alright I think that completes the part II of NetSec for n00bs. Part III will continue on with asymmetric encryption/public key crypto.</p>
<p><a href="http://www.roguelynn.com/words/2012-10-15-netsec-for-n00bs-part-iii-simple-intro-public-key-crypto" title="Public Key Crypto">Part III - Public Key Crypo</a></p>
NetSec for n00bs, part I: Password Storagehttp://www.roguelynn.com/words/netsec-for-n00bs-part-i-password-storage/2012-10-01T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>In prepping for my interviews for my <a href="http://www.roguelynn.com/words/2012-09-25-from-n00b-to-engineer-in-one-year" title="N00b to engineer">newly acquired position</a> with Red Hat for the <a href="http://freeipa.org/page/Main_Page" title="FreeIPA">FreeIPA</a> project, I did my own crash course in studying net security/applied cryptography. </p>
<p>It’s a very important subject, and I feel a lot of new developers rely on frameworks or libraries to implement this sort of ‘stuff’ for them without knowing what’s going on. So for those who are oblivious (like I was) to NetSec topics and concerns, here is some lo-down.</p>
<p>Back when I was in banking, on my local desktop I had a password-protected Excel spreadsheet of logins & passwords. I thought I was one step above just saving my passwords in a plain-text file. Since then, the bank has implemented a one-password storage system (although it added about 30 seconds for each system we wanted to log into). A password-locked file is quite easy to break if someone wanted to.</p>
<p>Currently, it is expected that passwords are stored on systems not in plain text, known as hashes. A <a href="http://en.wikipedia.org/wiki/Hash_function" title="Wiki: Hash Function">hash function</a> is used to produce a hash. This type of function is considered ‘one-way’, where the hash algorithm will take a plain text phrase (e.g. a password, or anything like a name, social security number, etc) and put out a hash, but would not be able to reverse it - take the hash and put out the plain text. If you put a hash into a hash function, you would get a new hash. Common types of hash functions: <a href="http://en.wikipedia.org/wiki/MD5" title="wiki: md5">MD5</a>, <a href="http://en.wikipedia.org/wiki/SHA-1" title="Wiki: SHA-1">SHA-1</a>, [SHA-2][SHA2] </p>
<p>MD5: “Message Digest Algorithm” produces a 128-bit hash. While it is still widely used, since 1996 flaws have been discovered, making it very conceivable to break hashes. SHA-1: “Secure Hash Algorithm” produces a 160-bit hash. Also widely used but also has security flaws in the algorithm itself (found around 2005). </p>
<p>SHA-2: consists of 4 hash functions producing 244, 256, 384 or 512 bit hashes. While no attacks have been successful on SHA-2, its algorithm is similar to SHA-1. It’s still considered safe, but time is ticking. </p>
<p>These hash functions are still decent enough for message passing, just not password storage. Another hash algorithm, <a href="http://en.wikipedia.org/wiki/Bcrypt" title="Wiki: Bcrypt">Bcrypt</a>, is thought to be more resistant to brute-force attacks. This is because the algorithm adapts over time, basically lengthening the time it takes to create a hash, making it difficult to leverage increases in computer power. </p>
<h4 id="two-things-to-note-with-hash-functions">Two things to note with hash functions:</h4>
<ol>
<li>Don’t implement your own hash function. Just don’t.</li>
<li>Speed should not be a factor when hashing passwords. Basically, the longer it takes to hash a password, the longer it takes for brute-force attacks. Speedier algorithms like MD5 or SHA-1 is fine for message-passing and source verification, but not good for storing sensitive information. </li>
</ol>
<h4 id="is-a-hash-enough">Is a hash enough?</h4>
<p>No. Storing a hash of a password isn’t enough. If only a hash function is used to store a hash of plaintext, it’d be pretty easy to break it. For instance, if your password was ‘password’, it would have the same hash value as someone else who also uses the same password. If someone were to break into a database of hashed passwords, it’d be very simple to do a quick frequency analysis, whichever hash is the most popular would probably be ‘password’ or something like ‘1234’. This is essentially called a <a href="http://en.wikipedia.org/wiki/Rainbow_table" title="Wiki: Rainbow table">Rainbow Table</a>, a set of reverse-engineered hashed passwords. It is reasonable (in the amount of time spent, aka brute force) to create a rainbow table of passwords up to about 8 characters in length. </p>
<h4 id="so-then-what-now">So then what now?</h4>
<p>In order to combat the ease of creating rainbow tables, a <a href="http://en.wikipedia.org/wiki/Salt_(cryptography)" title="Wiki: Salt">salt</a> is added to the password before it is hashed. It is very important that this salt is used only once, and is unique per user/password. For instance, if you use ‘password’ as your password, your salt could be (which would be decided behind the scenes) ‘1\$0-1209-1!@fkjl’, so the input for the hash algorithm would be ‘password’+‘1\$0-1209-1!@fkjl’. Your salt would not be used for anyone else’s password. So if someone else uses ‘password’, their salt would be something totally different, like ‘sl;–90(d%’, baring a different hash value for the same plaintext password. </p>
<h4 id="side-note">Side note</h4>
<p>If you’re interested in the history of cryptography, breaking passwords and frequency analysis as well as a great history on computing, I would highly suggest <a href="http://www.amazon.com/gp/product/B004IK8PLE/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B004IK8PLE&linkCode=as2&tag=roglyn-20" title="The Code Book">The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography</a>. </p>
<p><a href="http://www.roguelynn.com/words/2012-10-03-netsec-for-n00bs-part-ii-ciphers-symmetric" title="Symmetric Ciphers">Part II - Ciphers</a></p>
Operation: PyCon attendancehttp://www.roguelynn.com/words/operation-pycon-attendance/2012-09-29T09:41:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Thinking about going to <a href="https://us.pycon.org/2013/" title="PyCon">PyCon</a> for 2013? Here are arguments to make for yourself and/or for your employer to go to PyCon this year.</p>
<h4 id="because-you39re-new-to-python">Because you’re new to Python</h4>
<p>This year’s PyCon is all about education. The Program Committee is making an effort to include talks that cover all levels of experience; this includes talks, tutorials, and posters. Need I remind you, the conference’s tag line is:</p>
<blockquote>
<p>Change the future - education, outreach, politeness, respect, tenacity and vision</p>
</blockquote>
<p>Beginners will probably get more benefit from tutorials than talks, and it’s perfectly okay to just sign up for tutorials. If you’re hesitant, take a look at <a href="https://us.pycon.org/2012/schedule/" title="PyCon 2012 schedule">2012’s schedule</a>. If there are even a couple of talks that seem interesting to you, you can be certain that similar subject matters will be spoken about for 2013. I also want to point out that talks and tutorials, while key to the conference, do not make the whole conference. Spending hours in the Exhibit hall, talking to companies that use Python, learning about community outreach programs or open source projects are what makes the most out of a PyCon experience. Many companies will be recruiting, and perhaps that is intimidating, but it shouldn’t be. Talk to them about the steps it takes to succeed at their company; find out what they’re looking for in an engineer. This information can help you set goals for what you want to get out of learning Python.</p>
<h4 id="because-you39re-a-veteran-to-python">Because you’re a veteran to Python</h4>
<p>There will be plenty of talks & tutorials on the other side of the spectrum: intermediate, advanced, and extreme/deep dive. Python is ever evolving, always improving, so best not to quit learning. There will be many challenging subjects spoken about, brilliant people holding conversations over their posters or post-talk/tutorial. Not to mention sprints. Either join with the core language, or other Python projects that interest you. Or perhaps this will give you the time and focus to code out that project you’ve always had on your todo list.</p>
<h4 id="you39re-convinced-but-your-employer-might-not-be">You’re convinced, but your employer might not be</h4>
<p>It’s cheap. Tutorials are $150/200 each (early bird/on site) for 3 ½-4 hours of intense learning. The conference is $450/$600/$700 (corporate prices, early bird/regular/on site). Comparatively, these huge conferences often go for far more than that. You will learn more in 9 days of tutorials, talks, and sprints than a year of reading books. Not only will you be exposed to trends, new technologies and ideas not yet written about, you have the opportunity to talk with other Python devs. Another company had the same deployment issues you folks are having? What about managing real time data? The hall way track presents many opportunities to have those conversations and knowledge transfers.</p>
<p>A relavent tweet to part on:</p>
<blockquote>
<p>CFO to CEO: What happens if we invest in developing our people & then they leave the company? CEO: What happens if we don’t and they stay? — Christina Haxton (@ChristinaHaxton) <a href="https://twitter.com/ChristinaHaxton/status/252074201178066944" title="Tweet">September 29, 2012</a></p>
</blockquote>
From n00b to Engineer in 1 yearhttp://www.roguelynn.com/words/from-n00b-to-engineer-in-one-year/2012-09-25T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>It is my great pleasure (and squee!) to share with my friends, family, PyLadies, twitter nerds, Women Who Code'rs, DevChixen, Systers, and everyone else that I can now say: </p>
<h4 id="i-am-an-engineer.">I am an Engineer.</h4>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/noob-to-engineer/Joey_Chandler.gif" title="Joey & Chandler from Friends: Moving Day" alt="Joey & Chandler from Friends: Moving Day"/></p>
<p>Yes, the same person that once did not know the difference between a compiler and an interpreter, couldn’t explain wtf __init__.py files do, or conceive of generators versus iterators. The same person that took her way too long to admit that you can return more than one item in a function. This is the same n00b that was once told by Leah Culver that one</p>
<blockquote>
<p>“…can’t just learn how to code without a CS degree.”</p>
</blockquote>
<p>(I’d like to point out that it’s this exact industry that promotes and cultivates a culture <em>not</em> of ivory tower pedigrees but of free, easy access, and open source education.) Well my friends: by no means am I an expert, nor am I graduating from n00b status. I am, however, taking my next step. Drum roll… I am joining <strong>Red Hat</strong> as a ~~ black hat hacker~~** Associate Software Engineer** working with OSS projects integrating RH’s <a href="http://http://freeipa.org/page/Main_Page" title="freeIPA">identity and authorization</a> down in Mountain View. Coding, community-building, applied cryptography, what could be better? I’m just so honored and excited that my new manager and those I will be working with are willing to take a chance on me, and to give me the opportunity to grow into a field that I naively fell in love with (you can’t help who or what you fall in love with).</p>
<h4 id="oh-wait-there39s-moar">Oh wait! there’s MOAR!</h4>
<p>I’m being given the awesome opportunity of a 2-month, unofficial ‘boot camp’ at Red Hat’s <strong>Czech Republic</strong> location in Brno. HUZZAH! <em>Personal</em> *history insert*: I lived in Prague, CZ for a semester in 2006. Loved it. Wrote my senior thesis on its transitional economy. Lived like a queen with beer being cheaper than water. Utterly ecstatic to visit again. OK I’LL BE HONEST the real reason I’m joining RH is get my own <a href="http://www.networkworld.com/Google%20Subnet/redhat%20penguin.gif" title="Linux Penguin w fedora">red fedora</a>. #justkiddingnotreally</p>
<p>I’ll still lead PyLadies SF, live in SF, and be heavily involved in the Python community. Just now employed, with health care, and a hard-earn title.</p>
Brainstorm: Writing a PyCon proposalhttp://www.roguelynn.com/words/brainstorm-writing-a-pycon-proposal/2012-08-27T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>OH MAN PYCON IS SO MUCH FUN. *Right? I know!* I remember a tweet from <a href="http://www.twitter.com/hmason" title="Hilary Mason's twitter">Hilary Mason</a> that it should be “PyCAAHHHNNN” (imagine with a Boston accent while flashing \m/ upon entrance). </p>
<p>Ever thought about submitting a proposal? <em>What? oh no, no no no.*Why not? *What would I talk about? I have nothing to say!</em> (Oddly, when starting out submitting talks to conferences, I never had that dialogue or PoV. It was more like “I <3 the Python community. How can I be more involved?” and that’s how my ideation process started) (PS I love the verb *to ideate*. It’s great.)</p>
<h4 id="so-how-about-this"> So how about this:</h4>
<p>Tell me what talks you went to at previous years' PyCons that you found were pretty good (regardless of speaker performance)? What about: what would you like to see at PyCon <em>this</em> year? Ok. You want to see those topics? Why don’t <strong>you</strong> write the talk you want to see? Bam, you have an idea. (cheeky lil' blogger, aren’t I?) No seriously: if you want to see a talk, write it yourself. </p>
<p>Or: take what it is that you do with Python (professionally, hobby-wise, side-jobs, whatevs), and write a talk on it. </p>
<p>There is **significant** interest in the “**Python in the Wild**” - such as, Python in corporations (product/services or used internally), *government*, *education* (CS degrees, high school programs, communities), and <em>science</em> (NASA, robotics). </p>
<p>There is **always** interest in popular topics, like deployment, big data, python packaging. Not only do people have <strong>short memories</strong>, but every year, it will be many people’s <strong>first PyCon</strong>.</p>
<p>What tends to be **popular** are talks that compare frameworks/libraries/packages, critique a well-known/used tool/framework, and extreme talks (e.g. we’re getting down to the nitty gritty details of Twisted). </p>
<p>Topics that <strong>could do</strong> <strong>surprisingly well</strong> are security/cryptography, event-driven networking, and subjects that people should know about and probably default to industry standard, but don’t really know in depth. **Very interesting** subjects include <em>alternatives to giants</em> like Django (e.g. Flask, Pyramid), or *where has Python failed.* </p>
<p>There are **important** subject matters like <em>accessibility</em> within Python, and <em>diversity</em> & <em>community</em> building. Not all talks at PyCon are technical (my head would explode), but are needed and well-respected nonetheless.</p>
<h4 id="when-proposing-a-talk">When proposing a talk:</h4>
<p><strong>Submit early.</strong> You know how many proposals the Program Committee has to read through? Not to many right now. There is more time and patience to give feedback, give a second look after the speaker responds/edits. The committee also is not that tired yet. Submit on September 27th? Dead tired. Not really wanting to give feedback; just wanting this voting process over. </p>
<p><strong>Write an outline.</strong> Yes just do it. It can be bare bones, but it helps a lot to see where you’re going with this talk, if there is enough meat behind this talk or if it’s too ambitious for the time slot. Don’t write out your whole talk. _My n00b confession: I wrote a 2-page single-spaced essay for my proposed talk at OSCON. /facepalm_ It also significantly helps the reviewer and you to <strong>associate length of time</strong> per bullet/subpoint/etc. The reviewer has a sense of where you’re going, if it can actually be a full talk or if it’s too long. It also helps you frame your actual talk when you come to flesh it out (because no one <em>writes</em> the damn talk before they propose it for the first time). </p>
<p>Nervous about actually speaking? **Find a partner**. Either someone that shares your beginner level of speaking (both hold the burden of being nervous for your first talk) or someone that is already a seasoned speaker (you can relax a little!). </p>
<p><strong>Add links</strong> to the ‘about you’/bio portion, or <strong>more context</strong> in general about why you are the person to speak about this topic. Did a project with Google Summer of Code? Show the link. GitHub repo? Love to see it. Published a paper on the topic? Well it might not be *read*, the fact that it’s there would give assurance :D. Remember though, provide the reviewers with more context on why you should be speaking on this topic (not just some random article you wrote about how lame PHP is (but it is lame).). </p>
<p>Note: it is **not important** that you haven’t spoke at PyCon or another conference before. But do prove that why you should now. Think <strong>posters</strong> might be a better option for your first time presenting something at PyCon? With a talk, you have about 30 minutes. You can plan out what you’re going to say, how your talk might lead to certain questions (pro tip: leave some unanswered/unaddressed items from your talk, and look awesome when you know the answer if they’re asked). After the talk, for questions you don’t know, you also have the forgiveness of the audience:</p>
<blockquote>
<p>“Oh actually, hmm, that’s a very good question. I’m not sure I can address that here right now, but catch me after the talk.”</p>
</blockquote>
<p>(and then sprint away…:D) With a poster, you are defending your PhD thesis (practically), standing around for hours with some of the audience’s expectation of:</p>
<blockquote>
<p>“You know everything about this talk and I’m going to grill you because you’re stuck here.”</p>
</blockquote>
<p>It can be pretty demanding. But it’s why posters are great for in-depth and/or unconventional topics. There also a second shot at doing something with PyCon if you’re talk doesn’t get accepted since they are due Jan 15th. I think these hangouts that <a href="http://www.roguelynn.com/2012/08/25/pycon-proposal-brainstorming-via-google-hangout/" title="PyCon Proposal Brainstorming via Google Hangout">PyLadies</a> did were productive; I got the feedback that they were helpful. I also got the “zomg such a good idea but I can’t make it tonight.” I’d like to see more, perhaps from the Program Committee itself (full disclosure, I am a part of the committee). We have 4 weeks until talk proposals are due.</p>
<p><strong> Let’s help you give it your best shot. </strong></p>
Sadistic/Masochistic pleasure: Technical Interviewshttp://www.roguelynn.com/words/sadistic-pleasure-technical-interviews/2012-08-15T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>When I was in high school, I was in “IB” math - the international equivalent to AP courses, except harder. I really, really, *really* enjoyed math. </p>
<p>My teacher would always mispronounce my name (“Wynn!”), but nonetheless, I loved the challenge. I was the only one to receive 100% on the final exam, and one of a handful to take the IB exam. I studied regularly, went over classmates' houses, worked through practice problems of vectors, differentiation, integration, goodness it was awesome. I’d say that studying for this sort of stuff is very much like (if not the same concept) as the training I did for swimming: 2 hours/day in the pool, 6 days a week, extra hour each day of dry land training (side stepping the fact that right now, the most exercise I get is walking to/from cafes…). </p>
<p>While it’s been years since my crush on high school math, that same feeling comes again when studying for technical interviews. Few hours/day, every day, until the interview. I recently had the excitement of having a technical phone screen with Google, and I thought it went pretty well. Not in terms of I think I can make it to the next round (I don’t need to mention that the bar is set very high with them), but how I felt when comparing my last technical interview experience back in April. I’d like to share how I prepared myself. I’ll mention that I my CV reflects a lot of PyLadies work, as well as knowledge of Python (natch), SQL, git, and working with different APIs.</p>
<h4 id="how-i-prepared">How I prepared:</h4>
<p>I had about a week for preparation (contrary to my previous one, where I had about two weeks). Really, it had all depended on scheduling, but I tried to push it out as far as I could :D. First I read through a couple of Python resources just to become familiar with the ‘hidden gems’ of the language that I may not use regularly or don’t have exposure to:</p>
<ul>
<li> memorizing all of the language’s built in functions (easy peasy)</li>
<li> magic methods, as well as the difference between __str__ and __repr__</li>
<li> built in methods on data types (e.g. my_string.split())</li>
<li> difference between New Style & Classic classes</li>
<li> range v xrange, generators v iterators, return v yield</li>
<li> default dicts w/ default factories, and other container datatypes</li>
</ul>
<p>I also reviewed some SQL crap too:</p>
<ul>
<li> difference between joins and outer left/right/both joins</li>
<li> schema design (e.g. pk, fk, indices)</li>
</ul>
<p>Then I spent a lot of time doing algorithms, graph transversals, data structures (without Python’s help), etc. I used Gayle Laakmann McDowell’s book: <a href="http://www.amazon.com/Cracking-Coding-Interview-Programming-Questions/dp/098478280X/ref=sr_1_1?ie=UTF8&qid=1344972927&sr=8-1&keywords=cracking+the+coding+interview" title="Amazon link to Cracking the Coding Interview book">Cracking the Coding Interview</a> since it gives a very well laid out path to study for the typical technical interview. I found that Palantir (a tech/finance/info company that I have a hard crush on) has a great<a href="http://www.palantir.com/2011/09/how-to-rock-an-algorithms-interview/" title="Palantir's interview advice">how-to</a> on succeeding their technical/algorithm interviews (and I feel that applies to all aggressive tech companies).</p>
<p>My process each day would be to warm up with a couple of easy problems, then go onto a more difficult graph or algorithm problem. A lot reflects Gayle’s book, and in that book she uses Java (hence some of these problems can be easily implemented in Python with no thought). I’ll leave you to get her awesome book in order to read best approaches & the answers (literally, every example problem has a solution in the book, it’s great).</p>
<p>In loose order of difficulty, starting with what I’d have for breakfast and ending towards what made me skip meals:</p>
<ul>
<li><p>Strings/Arrays</p>
<ul>
<li> determine a string w/ all unique characters (both w/ and w/o
additional data structures)</li>
<li> write an algo to determine if a string is a palindrome</li>
<li> write an algo such that if an element in an MxN matrix is 0, its
entire row & column are set to 0.</li>
</ul></li>
<li><p>Linked Lists (without Python’s awesomesauce)</p>
<ul>
<li> creating a linked list</li>
<li> find the kth element of a singly linked list</li>
<li> remove duplicates from unsorted linked list</li>
</ul></li>
<li><p>Stacks & Queues</p>
<ul>
<li> implement a stack/queue (and know the diff between each)</li>
<li> use a single array to implement three stacks</li>
<li> in addition to push/pop for a stack/queue, also design a min
function to return the min node (operating in o(1) time)</li>
<li> towers of Hanoi problem</li>
</ul></li>
<li><p>Trees and graphs</p>
<ul>
<li> define a function to check if a binary tree is balanced</li>
<li> write an algo that creates a binary search tree w/ minimal
height</li>
<li> Breadth/depth-first searches (not too hard esp after stacks &
queues)</li>
<li> implement a suffix tree</li>
<li> Dijkstra’s algorithm</li>
</ul></li>
<li><p>Sorting & Searching</p>
<ul>
<li> you have two sorted arrays, A & B where A is large enough buffer
at the end to hold B. write a method to merge B into A in
sorted order.</li>
<li> find the index of a given integer in an array (unsorted)</li>
</ul></li>
<li><p>moar graphs, algos and random but good</p>
<ul>
<li> write a power function</li>
<li> implement an atoi function</li>
<li> write a simple caesar cipher (both encoding & decoding)</li>
<li> write a greedy algorithm that returns amount of change in coins
(quarters, dimes, nickels, pennies)</li>
<li> write an algorithm/mini program that, given a maze, a start & an
end point, returns the path(s) needed to exit the maze (this was
not fun to do at 8pm the night before the interview, but it is a
damn good problem to attack).</li>
</ul></li>
</ul>
<p>For some reason, perhaps someone could enlighten me, I wasn’t able to get <a href="http://community.topcoder.com/tc" title="top coder">TopCoder</a>’s java applet running (although I could months ago). Those algo problems are great, challenging, and appeal to different levels. TopCoder’s problems would be a fantastic help to those prepping for the in-person technical interview. Hopefully I’m the only one with the issue. A week or so after my interview, someone posted on <a href="http://www.reddit.com/r/learnprogramming/comments/xwd16/had_a_technical_phone_interview_today_for_an/" title="a self-post on reddit">/r/learnprogramming</a> in reddit what questions s/he was asked during a technical phone interview. Very helpful (note that the interviewee is a C++ person interviewing for an entry level software engineer position at a mid-sized, well-known company):</p>
<ul>
<li> What kind of data structure would you use to hold an arbitrary amount of objects in which retrieval is not an issue?</li>
<li> Follow up: what are the advantages and disadvantages of using a linked list over a dynamic array structure? [good time to talk about Big-O performance]</li>
<li> Describe the main properties of a binary search tree.</li>
<li> What would happen if you inserted a sorted list into a binary search tree?</li>
<li> Follow up: how can you avoid this problem?</li>
<li> Say you want to retrieve from a data set a person’s name based on an alphanumeric ID number. What kind of data structure would you use?</li>
<li> Compare a hash table and a map type structure and their big-O performance on various common functionality (search, insert, etc.).</li>
<li> What is the worst case retrieval time for a hash table and how can this happen?</li>
<li> Briefly describe object oriented programming and it’s main principles (MAKE SURE TO SAY: encapsulation, inheritance, and polymorphism, (and abstraction if you want)).</li>
<li> He specifically asked me to elaborate more on encapsulation and polymorphism. Why is polymorphism useful?</li>
<li> What is a join and an outer join? (SQL)</li>
</ul>
<p>Reading through those redditor’s interview questions, I actually felt that I can answer the majority of them. Actually I didn’t know what encapsulation was, but luckily I wasn’t the one in the hot seat for that interview, and I googled it :D.</p>
<h4 id="my-weak-points-that-i-will-work-on">My weak points that I will work on:</h4>
<ul>
<li> Understanding what can be done to improve runtimes of an algorithm (not just be able to tell the runtime).</li>
<li> Being more confident in my understanding of concepts. For instance, I know what a RESTful API is, but how do I spit it back coherently? Same with explaining git.</li>
<li> Calm the f&ck down.</li>
</ul>
<h4 id="things-i-thought-i-did-well-in">Things I thought I did well in:</h4>
<ul>
<li> Wrote & instantiated a class that would be used to make a graph, using the requested private variables/methods (he seemed a little impressed). Although I do believe I mixed up new style v classic. /facepalm</li>
<li> Talking through each and every damn thing I was saying and writing (was not easy the first time). And going back through what I coded to test simple cases & fixed typos/errors.</li>
<li> My understanding of repr v str (finally…).</li>
<li> Simple schema writing (from the first interview).</li>
<li> Before coding, switched the font/size to courier 8 (seriously, red flag to them if not using a fixed-width font when gdocs defaults to a variable-width one).</li>
</ul>
<p>I’m certainly not counting on going further from that interview. I quite like the set up I have now doing contracting with a friend’s consulting + Harvard’s project, and I seem to walk into new opportunities (then again, this is the Bay Area). Thankfully I’m also able to contribute to open source, give talks & go to conferences, put on workshops and events, etc.</p>
<p>But hey - you’d understand if Google came calling, right? I actually look forward to the day where I can go in and code on a whiteboard, but that will take a lot more studying, understanding of more complex algorithms, better understanding of runtimes, perhaps systems architecture. For those PyLadies and Women Who Code'ers in the Bay Area, WWC is hosting a <a href="http://www.meetup.com/Women-Who-Code-SF/events/77224932/" title="Women Who Code tech interviewing event">Technical Interview evening talk/mini-workshop</a> with Gayle Laakmann McDowell next week. We’re also looking for mentors (and food sponsors!), so hit me up if you’re interested! <strong>Update: </strong>I was told not to divulge what questions I received from the interviewer as it reflects poorly. So, let me just say: what I studied aligned pretty well with what I was asked :D I just…wish I had a better memory! Another resource given to me: <a href="https://www.interviewstreet.com/challenges/" title="Interview Street Challenges">Interview Street</a>, and I forgot <a href="http://www.careercup.com/" title="Career Cup: Programming Interview Questions">CareerCup</a>.</p>
Our generation's Legoshttp://www.roguelynn.com/words/our-generations-legos/2012-07-28T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>First time I heard about raspberry pi was through twitter. I thought: Oh I bet that’s a cute name for a new python module. Didn’t look into it. </p>
<p>Then I overheard a <em>very</em> dedicated PyLady saying “OMG my Raspberry Pi shipped today!” Wait, a module shipped? She doesn’t develop python modules. Is this a physical thing? </p>
<p>Then, weeks later, I asked another friend, “What is this Raspberry Pi?” The world stopped to him. “What, are you serious?” he responded. “Yeah…?" </p>
<p>Clearly I wasn’t getting "it." </p>
<p>Then I preceded to get educated in "it.” Sure, I thought - that’s cool, a credit card sized computer. It didn’t seem that big of a deal to me - that the kind of stuff iPhones & Androids run on, wicked small computers. It’s \$35 too - not bad, but it still didn’t appeal to me.</p>
<p>And then after a few days after that conversation - it hit me. Holy sh*t this is a revolutionary idea. Holy crap, I get “it." Now I understood why there was a 3 month wait time for ordering. To share this enlightenment, I can’t simply say "imagine the possibilities." A lot of people require handholding for some creative thinking. </p>
<p>So let me continue my thought process: I hadn’t really been ‘into’ computers beyond the use of them until about 9 or 10 months ago. I was damn good at my shortcuts around Excel, my Google-fu skills, my typing speed thanks to AIM, but I didn’t really care about what was going on underneath my laptop’s keyboard. September 2011 I took my first computer science course. <em>Not</em> because I was interested in learning to program, but because I needed to know programming to apply to graduate school for financial engineering. December 2011 I finished with an A- and never looked at finance again. In my new life, my ‘finally found my passion’ mindset, I haven’t stopped consuming programming, learning computers, understanding security risks, teaching others and myself python, speaking about it, etc… My 2008 MBP died earlier this year. It comes back to life every once and a while, but when it does kick the bucket for one last time, oh boy will I perform an autopsy. I will dissect, rip apart, hack, and reuse anything and everything. </p>
<p>Enter: Raspberry Pi. </p>
<p>Raspberry Pi appeals to hackers, programmers, the child still inside of all of us. It’s our generation’s Legos. I’m going to take that piece of computer board fruit and I’m going to hack the sh*t out of it. I’m poor, so I can’t buy even the cheapest of monitors. OH WAIT I have an old MBP lying around. Not sure how that will go but I will try, and I will learn how monitors work in the process. I have legos (yes…just bought a space shuttle kit). </p>
<p>You know what I’m going to do? Try something out with PyGame and my lego wheels (I feel sorry for the cat and the impending annoyance I will cause him). I plan to make a home security system - quite needed living in the Mission of SF. How? not sure, but it’s damn well possible. Probably hook up one of the speakers that I have from 1999, or steal one from my MBP. It’ll probably be a hack job at first - faulty wires exposed, false alarms and annoyed neighbors. </p>
<p>OH goodness what about a voice-command thing tailored to my voice? "Start Pandora” (sorry Spotify). “Start picture slideshow." "Turn off lights." "Turn on coffee machine." "Remind me to take my meds." MY OWN STARTREK LIFE. I’ll get bothered by how relatively slow Debian Squeeze will be (the OS you’ll have to download yourself). There goes my weekend of learning/writing linux kernels. Hmm after all these thoughts, I stepped back for a second. </p>
<p>This computer isn’t just for me, and for people like me. A \$35 computer in the hands (literal hands!) of children. Not some mystifying ½ in x 13 in x 9in slap of aluminum w/ a keyboard. Small a\$\$ computer + PyGame + Legos (+ Arduino maybe) = only limited by imagination. Perhaps a child can amuse a cat’s afternoon like me. Or perhaps strap it on one of those small, build-your-own rockets (I LOVED those as a kid!), shoot it up in the air, and record flight data (is that possible? f^ck who cares, I’m going to do it now). </p>
<p>What if we and our kids take this opportunity to put this computer in the hands of developing nations' children? I’d chip in \$35 to send this to an eager child with little access to it him/herself. Why don’t we + our kids preprogram & preload it with learning tools, with Khan academy videos? Why not send one monitor down, a bunch of cables, and Raspberry Pis w/ edu videos for a whole village? I’ll throw in some legos in there. Oh man, you know what the best part of it is? </p>
<p>Python is on Debian Squeeze. The makers of RP & the OS couldn’t have selected an easier language for the world to learn. Looking at how far _I_ got in ½ a year of learning Python, and I do feel the hindrance of an aging brain. Imagine the pace at a child learns. Let me say that again: Imagine the pace that a child learns. Just think. </p>
<p>Ok let me connect the dots for you: Is your company hiring? </p>
<p>Raspberry Pi is this a simple, evolutionary piece of hardware that has the potential to <em>revolutionize</em> the state of education. We are already experiencing the logarithmic rate of growth from technology. Education is piggy-backing along. And accessibility to education can’t and <strong>won’t</strong>be left behind. We as a community, Python, engineering, technology, etc) have this amazing opportunity to put Raspberry Pis in the hands of many. Sure, some people won’t see the importance of it, like I didn’t at first. But those that do and that will, you have given them the push, the nudge that everyone needs once and a while to just <em>do</em> something. </p>
<p>Hack at something. Create something. Revisit that child in him/herself. Pass it on to an actual child. </p>
<p>When I graduated, I got a very piss-poor commencement speech. Nothing inspiring, nothing moving. But one of the new deans was not too bad. Everyone liked him and his energy that he brought to our tinybusiness school. He said, and repeated throughout his speech, "Do great things." Cheesy, I know. I walked, received my diploma, and promptly got drunk. Four years after graduating, I’ve passively noticed thislittle voice, a little engine-who-could voice, repeating "Do greatthings." Here I am, internally lit by this motto. I hope this firecatches elsewhere, too. Do great things. Here’s our opportunity.</p>
Moar Bookmarks!http://www.roguelynn.com/words/moar-bookmarks/2012-07-27T09:49:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Seeing how my previous <a href="http://www.roguelynn.com/2012-05-20-gold-nuggets-my-bookmarks-collection" title="Gold Nuggets">bookmarks post</a> was well-received, here is MOAR! </p>
<p>A collection of helpful tutorials, tools, and libraries for folks learning Python/how to code, or looking to improve and explore their skills.</p>
<h4 id="tutorials-amp-how-tos">Tutorials & How-tos</h4>
<ul>
<li><a href="http://krondo.com/?page_id=1327" title="Intro to Twisted">Introduction to Twisted</a>: A thorough introduction to Twisted and Asyncronous programming in Python. </li>
<li><a href="http://try.github.com/levels/1/challenges/1" title="tryGit">tryGit</a>: Learn Git with GitHub& Codeschool’s new tutorial. </li>
<li><a href="http://www.konstruktor.ee/blog/django-the-undocumented-settings/" title="Django: the Undocumented Settings">Django: The Undocumented Settings</a>: Tricks when writing the settings.py file for Django. </li>
<li><a href="https://devcenter.heroku.com/articles/django" title="tutorial for Django on Heroku">Getting Started with Django from Heroku</a>: A tutorial using Django and deploying with Heroku. Assumes some knowledge of virtual environments. </li>
<li><a href="http://people.csail.mit.edu/pgbovine/python/" title="Online Python Tutorial">Online Python Tutor</a>: Learn & Practice programming in your browser.</li>
<li><a href="http://craigkerstiens.com/2012/06/19/pro-tips-for-conference-talks/" title="Conference Pro Tips">Protips on Conference Talks</a>: Thinking about giving a talk at a conference? Here is some great advice on how to rock it!</li>
</ul>
<h4 id="tools">Tools</h4>
<ul>
<li><a href="http://opensourcehacker.com/2012/05/11/sublime-text-2-tips-for-python-and-web-developers/" title="Sublime Text 2 for Python & Web Developers">Sublime for Python and Web Developers</a>: How to squeeze more power out of one of the most popular, light weight text editors for Python.</li>
<li><a href="https://www.shortcutfoo.com/app/tutorial/sublimetext" title="Sublime Shortcuts">Sublime Short cuts</a>**: Learn to get more out of the latest text editor. </li>
<li><a href="http://www.corntab.com/pages/crontab-gui" title="Crontab GUI">Visual Crontab Editor</a>: Easily create crontab syntax for your Linux or Unix servers.</li>
</ul>
<h4 id="libraries">Libraries</h4>
<ul>
<li>[PEP-8 Style Guide Checker][PEP8]: Are you writing code according to the standard Python guidelines? Have no idea? Try out this library.</li>
<li><a href="https://github.com/BrewerHimself/Logr" title="Simple Python blogger">Logr</a>: Simple Python blogger. </li>
<li><a href="https://github.com/kennethreitz/legit" title="Git for Humans">Git Legit</a>: Git for Humans.</li>
</ul>
PyLadies @ OSCONhttp://www.roguelynn.com/words/pyladies-oscon/2012-07-26T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I had the wonderful opportunity to bring up a few fellow PyLadies to Portland, Oregon for O'Reilly’s Open Source Convention. </p>
<p>Thanks to the help of the <a href="http://www.psf.org" title="PSF">Python Software Foundation</a>, we were able to bring along swag including stickers, postcards of information on how to start a local PyLadies chapter, and a new banner. </p>
<p>It was an interesting experience. There was a lot of excitement for the conference; the open-source community is fantastic to be a part of. But what was weird was that I felt there were not a lot of Pythonistas there. Similarly, I felt PyLadies' presence wasn’t as well received as I originally anticipated. I heard a lot of people walk by, asking “pe… pilates? what’s pilates?” Interestingly enough, this gave me an event idea: “PyLadies does Pilates!” :D </p>
<p>The talk that I gave was a little awkward, too. Not many people showed up (bummer), and the majority of the folks that did were women. That was actually pretty awesome to witness, but it was sort of preaching to the choir. :) One thing I do thoroughly enjoy is the conversation & questions that come after my talks. I’m always blown away how everyone wants to get more women involved, and how welcoming new ideas and thoughts are to do such thing.</p>
<p>In the spirit of Open Source - be sure to check out <a href="http://www.meetup.com/PyLadiesSF/" title="PyLadies SF Meetup Group">PyLadiesSF</a>’s upcoming events:<a href="http://www.meetup.com/PyLadiesSF/events/73638942/" title="Learn to Contribute to OSS event">Learn to Contribute to Open Source</a>, and<a href="http://www.meetup.com/PyLadiesSF/events/73639302/" title="Django + PyLadiesSF Sprint">PyLadies + Community Django Sprint</a> to improve the well-known Django tutorial.</p>
Moar Womenz at DjangoCon US!http://www.roguelynn.com/words/moar-womenz-at-djangocon-us/2012-07-25T18:06:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<h4 id="change-the-ratio-at-djangocon-2012">Change the Ratio at DjangoCon 2012</h4>
<p>If you’d like to go to <a href="http://www.djangocon.us/" title="DjangoCon US 2012">DjangoCon 2012</a> in Washington, D.C. – or know a talented developer who does – but cannot attend due to financial hardship, please <a href="https://docs.google.com/spreadsheet/viewform?formkey=dDc1X2hrUGJVRGdEWnRjTklxR2tSNFE6MQ#gid=0" title="FinAid">apply for financial aid through DjangoCon</a>. **And**, if you’re a PyLady (as opposed to a PyLaddie), please indicate that you would like to be considered for a** PyLadies grant** on that form. This year’s conference goes from Sept. 3 - Sept. 9. The deadline for the PyLadies grant submission is **Friday, August 3rd**. We apologize for the short notice, but we know that people need time to make travel plans/request days off from work, etc. You will hear from us by the end of the week after applications are submitted. If you know of a talented, deserving female developer who might benefit greatly from attending DjangoCon, email <a href="mailto:info%5Bat%5Dpyladies%5Bdot%5Dcom" title="Email to Info">us</a>. We’ll do our best to help as many women as possible. </p>
<p>Many thanks to the DjangoCon organizers and the Python Software Foundation for helping the PyLadies manage this program.</p>
<h4 id="sponsors">Sponsors:</h4>
<p>The goal of this program is to bring new talent to the Djangosphere by supporting developers and giving them opportunity to make connections in the community. Your support of PyLadies’ outreach efforts will pay off hugely over the long term by improving the diversity of the industry as a whole. Companies & Individuals are invited to offer funding for talented developers who have demonstrated or show potential for a serious commitment to diversity in the Django commnunity. Donations will go through the <a href="http://python.org/psf" title="PSF">Python Software Foundation</a>, and are thus **tax-deductible**, yeah! Please email <a href="mailto:sponsors%5Bat%5Dpyladies%5Bdot%5Dcom" title="Email to Sponsors">sponsors@pyladies.com</a> if you’d like to send a woman developer to DjangoCon.</p>
A memorable EuroPython, for the betterhttp://www.roguelynn.com/words/a-memorable-europython-for-the-better/2012-07-12T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>EuroPython marked many memorable experiences for me, as a PyLady, as a member of the Python community, and as a newbie to coding & programming. </p>
<p>First, it is widely known and very well accepted that the folks at EuroPython did a fantastic job in organizing the whole conference. Every detail was well thought out. Coming from 7000+ miles away, I did not feel like a foreigner. Little things, including inviting me to join them in the evening a couple days before the conference; having cordial side conversations; and making connections & introductions, it all really helped me out. </p>
<p>To be given an opportunity to speak about what I’ve been doing with PyLadies in San Francisco and Women Who Code is fantastic; and to have it elevated to a keynote was all the more surreal. I thought it was pretty well received by the audience. Many folks asked really good, difficult questions; things that needed to be discussed. My approach to the talk was to provoke some thought in the audience without really alienating them. </p>
<p>Yet many still wanted to get to the elephant in the room: “what are we doing that we don’t know we’re doing?” Long answer short: It’s not Python community-specific, but how we bring general societal norms into the community. We, the python community, have been very open minded with who joins us. I’m not sure why, if it’s a precedent that Guido set, if it’s the type of people the language attracts, or if it’s because it’s the model child of FOSS. It’s certainly a breath of fresh air compared to the likes of Java or C (no offense, well, maybe…). </p>
<p>Another hot question was “Is Guido’s t-shirt, Python is for Girls, sexist?” From the point of view of a woman, it ostracizes me. It feels like someone is trying to dress up programming with the color pink. Or like Lego trying to appeal to girls with <a href="http://shop.lego.com/en-US/Stephanie-s-Outdoor-Bakery-3930" title="Lego Backery">it’s easy-bake oven twist</a>. But when talking to Guido one-on-one, as I would have thought, he of course didn’t mean it to be offensive. Rather, he means to challenge the programmer paradigm. After the keynote, many approached me. One gal from <a href="http://geekgirlscarrots.pl" title="Geek Girl Carrots">geekgirlscarrots.pl</a> gave me the greatest greeting and sincerest thank you I’ve heard. Men came up asking for help and ideas how to get more women involved in their communities back home. I felt this talk was very well received (despite being so nervous for it!).</p>
<hr>
<p>The most-talked about event, in my opinion (probably because I was the center of it), was the incident outline in my <a href="http://www.roguelynn.com/words/2012-07-05-really" title="Really">previous post</a>. Namely, a conference attendee tweeted something offensive in reference
to #pyladies, I called him out, addressed it with the EuroPython organizers, and allowed a discussion (if you can call it that) happen on my blog. My first thought about that event: The organizers of EuroPython handled it exceptionally well. I went to the venue early before the conference started for the day, and they were already aware of the tweet (and I assume my reaction to it via twitter or my blog, as well). They were very diligent in consulting the appropriate people on the EuroPython/Python Italia board. The result was, I felt, adequate in addressing the situation. <a href="https://twitter.com/pwang/status/221172889813131264" title="Apology 1">A public apology</a> (<a href="https://twitter.com/pwang/status/221196983031963654" title="Apology 2">two</a>, actually) from the attendee via twitter and <a href="http://www.roguelynn.com/2012/07/05/really/comment-page-1/#comment-5125" title="Apology Comment">on my blog</a>, a <a href="https://twitter.com/europython/status/221201258424442881" title="EuroPython Stance">public stance</a> on their <a href="https://ep2012.europython.eu/code-of-conduct" title="EuroPython CoC">Code of Conduct</a> via twitter, and plenty of reassurance in private. I can’t stress enough that the organizers were very respectful, and did exactly what I thought should be done. This is exactly why there needs to be a CoC at conferences, and why the Python community is very adamant about signing the <a href="http://letsgetlouder.com/" title="Lets Get Louder">Let’s Get Louder campaign</a>. I’d like to quote a comment on the Let’s Get Louder campaign from someone who I met at EuroPython, but didn’t know at the time of <a href="http://www.reddit.com/r/Python/comments/vroj4/python_programmers_sign_pledge_to_only/c579eeb" title="Hynek's Comment">when this was posted</a>:</p>
<blockquote>
<p>It’s amazing how people who have never been affected by any type of discrimination decide for the rest, that it’s not a big deal and not our problem. But it *is* our problem. Because *this* is *our* community. And I will not tolerate that people get discriminated or harassed … without any reaction from *inside* the community. </p>
<p>Because that’s bad for our community. For every single person of us. A code of conduct is not about giving women or other minorities the power to get people they dislike out of conferences just by accusing them. </p>
<p>So stop bringing that up. It is a firm statement by the organizers and everyone who decides to take part of such an event that *we* as a community value every member of it and don’t want assholes make them feel not welcome. That if someone harasses them we’ll be there for them and won’t just shrug our shoulders and/or relay cowardly responsibilities to ominous third parties.</p>
</blockquote>
<p>When this incident happened, I <del>teared up a bit</del> cried. Not because I took the tweet to heart, but because I knew that I was in a safe space, here at the conference. I had this comment in the back of my mind already, and to reflect back to it gave me some comfort. I was comfortable showing up to the conference, having normal conversations with folks, and did not feel ostracized. The apology, while I don’t doubt the sincerity, showed lack of cognizance & self awareness of the issue at hand. For clarity: some people don’t get it. From another community that I shared this incident with, this response hits the point pretty well. I want to illustrate a female point of view that I heavily agree with, but can’t seem to voice it as well as her. With her permission, I am posting anonymously:</p>
<blockquote>
<p>Without #Pyladies it’s not a joke, it’s just a contentless anecdote. The hash tag wasn’t just accidental or in poor taste. It would still have been sexist and offensive and objectifying and rape-culturish if he had tagged it “#ladies?"or ”#Wife?“ (God forbid we say so though, or expect men to actually think about women as people instead of blow job machines.) </p>
<p>It is extra-special-nasty because he implied that specifically technical women who were where he was right then are *still* nothing more than objects for his sexual pleasure, and so it becomes a specific and personal threat/dehumanization of actual, specific women whom he could probably see at the time. That is offensive and sexist and objectifying *and* creepy and discouraging and alienating and perpetuates the culture that creates the gross gender imbalances in our industry. Until these dudes start deconstructing their thoughts themselves, I will deconstruct their thoughts for them. </p>
<p>If they don’t like what their jokes say about them, they should start thinking before they speak now shouldn’t they? It is entirely possible to make dirty jokes that don’t rely on a blind sense of entitlement to sexual relations with the generic category of "women” (or any other generic category, for that matter.) </p>
<p>For example, “Sucked the marrow out of my osso bucco at #Europython. Waiter said I will have "very good orgasm.” He was right. #Flexible" Or “Sucked the marrow out of my osso bucco Waiter said I will have "very good orgasm.” Wasn’t practicing for me #LoveYouDear"</p>
</blockquote>
<p>This isn’t an overreaction; this was a perfect situation to bring to light the type of incidents women do face. Sure, maybe one comment can be easily brushed off. But it isn’t one, it’s many. From both in private at home with silly, innocuous jokes, to a$$hat comments made in public. It adds up. The general creepy comments from being the only female at a table drinking; the hurtful, disrespectful, and gaslighting tweets & blog comments; the crass jokes; the “calm down, just breathe. You’re overreacting” conversation (!). </p>
<p>For simple, lightbulb moments that I’ve witnessed some men get, it’s gratifying and reassuring. It’s the reason I’m speaking. The soft “a-ha” speaks loudly. The support that I got that evening and the rest of the conference was amazing. Some went up to bat on behalf of me/PyLadies/women in tech. Some simply asked how I was doing. Overall, I could not have asked for a better experience. I met many prolific and awesome people in the Python community. I’ve made friends that I know will last beyond the sunny days of Florence. And I’ve made a pretty hard dent in my liver. I’m looking forward to next year.</p>
You ask why I need to talk about women in the Python community?http://www.roguelynn.com/words/really/2012-07-05T08:50:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<h4 id="update">Update:</h4>
<p>With the pseudo/half baked apology from the original offender, I’ve decided to remove any identifying characteristics. </p>
<hr>
<p>I had an amazing talk on Wednesday, July 4th at EuroPython. I maintained my positive attitude; I tried not to alienate the audience (especially the majority being male); I tried to keep the conversation going during the question & answer session and afterwards. Many men had fantastic questions regarding my subject: increasing female engagement in the Python community. I returned home tonight to find this: </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/you-ask-why/pwang_tweet.png" title="Troubling tweet from @pwang" alt="Troubling tweet from @pwang"/></p>
<p>I JUST gave a talk on yesterday about women & Python. <strong>This dude asked me a question on the things men do unconsciously.</strong> WHAT THE HELL I DON’T EVEN! My thoughts are to be continued later after I’ve calmed down and addressed this with the organizers. </p>
EuroPython 2012: Day 1http://www.roguelynn.com/words/europython-2012-day-1/2012-07-03T08:52:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>If you’re a Python enthusiast, and you're **not** at EuroPython, you’re missing out, I’m sorry to say. </p>
<p>Not only has the conference organizers put on a near flawless first day, they provided attendees with amazing food for thought with a fantastic program of keynotes to kick off the event.</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/europython-2012/me_europython_slide.jpg" title="Me on EuroPython 2012 slide" alt="Me on EuroPython 2012 slide"/></p>
<p>Everyone was a bit start struck that Guido van Rossum was here; it’s funny to see folks put him on a little bit of a pedestal when he’s a normal nerd like the rest of us. And naturally, he was wearing his favorite shirt: Python is for Girls. (f&ck yea!)</p>
<p><strong>Side note:</strong> Female attendance of EuroPython has <strong>doubled</strong> since last year, from 4% to 8%, while the overall attendance has increased about 100 people, or about 15% from last year (a rough estimate from collected conversations).</p>
<p>Guido’s talk was very similar to his closing keynote at PyCon US this year, Not the State of the Union. What I found more interesting was the moderated Q&A he did towards the end of the day. More on that in a second…</p>
<p><a href="https://ep2012.europython.eu/conference/p/alex-martelli" title="Alex Martelli">Alex Martelli</a> did a talk on the fantastic subject of Ask for Forgiveness, not for Permission, inspired by Grace Hopper. It was very awesome and heart-warming to hear the crowd on twitter really welcome and embrace that line of thought, both because it’s a great motto to live by (esp for engineers), as well as a very prominent and well-respected woman said it. Only, he used Apple’s version of Comic Sans on his slides: Chalkboard. Not that forgivable… (although I’ve committed that crime as well in my Python study group slides *hides head in shame*).</p>
<p>Also, I didn’t realize that Martelli was the author of Python in a Nutshell, and co-edited w/ this wife, Anna, the Python Cookbook. I should have lugged that door stopper to get it signed!</p>
<p>Next, <a href="https://ep2012.europython.eu/conference/p/febo-cincotti" title="febo Cincotti">Febo Cincotti</a> presented on the progression neuroscience has made in allowing those who have absolutely no way to communicate without the help of computers in his talk titled “Let your Brain talk to Computers.” While not Python-centric, it was very inspirational and detailed on how far science has come with respect to helping those like Stephen Hawking in the ‘locked-in’ state.</p>
<p>Naturally, it wouldn’t be a Python conference without a state of the PyPy endeavor and a GIL-less future by <a href="https://ep2012.europython.eu/conference/p/armin-rigo" title="Armin Rigo">Armin Rigo</a>, <a href="https://ep2012.europython.eu/conference/p/antonio-cuni" title="Antonio Cuni">Antonio Cuni</a>, and <a href="https://ep2012.europython.eu/conference/p/maciej-fijalkowski" title="Maciej Fijalkowski">Maciej Fijalkowski</a>. I really enjoyed this talk mostly because these gentlemen had such a dorky, academic/scientist-like passion about the progression of PyPy. Their story came off as very inspiring, honest, and enthusiastic about the massive undertaking of a project. It was interesting that Armin even spent some time making his own Python interpreter (seriously, how passionate & dedicated is he?). It will be great to see how these gentlemen progress with Transactional Memory & Automatic Mutual Exclusion. Fascinating.</p>
<p>An unexpected highlight of the evening was the quality of some lightning talks. Specifically, <a href="https://twitter.com/#!/llanga" title="Lukasz Langa">Lukasz Langa</a>’s <a href="http://www.slideshare.net/lukaszlanga/i-regret-nothing" title="I Regret Nothing">I Regret Nothing, You can’t judge me</a>. Full of hilarious memes, fast-paced jokes, and honest confession of his horrible habits, Mr Llanga showed EuroPython just how lazy of a Python programmer he is. From simplified and nearly unidentifiable variables, to multiple inheritance, to horrific monkey patching, he convinced the audience that a) he’s lazy, b) he doesn’t regret it and c) god bless your soul if you inherit his code.</p>
<p>To wrap up the evening, Guido van Rossum did a casual but well attended moderated Q&A. He was quite frank in answering his questions, from “I just don’t know what this person is asking” to “I’m not going to waste my time answering this.” My next post will be the loosely transcribed conversation, after I refactor it a bit to actually be readable. But believe me when I say that while Guido is quite tired of the same question, “Looking back, what would you do differently”, he’s a fantastic sport that offers great dialogue for us in the Python community. So stay tuned!</p>
My un-talk at DjangoCon. Warning: rant.http://www.roguelynn.com/words/my-un-talk-at-djangocon/2012-06-06T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I woke up the morning of <a href="http://2012.djangocon.eu/schedule/involving-women-in-the-community/" title="DjangoCon EU Talk">my talk</a> lightly hung over after the awesome Divio after party for DjangoCon Europe/Zurich wanting to completely scrap my whole prepared talk this afternoon about what I’ve done to educate women, and how to make the <a href="http://prezi.com/kzr7laiqlc-9/revenge-of-the-n00bs/?auth_key=721e5863405e7286309384eda327c2acb0616d42" title="Prezi">Django tutorial better</a> for new folks coming into programming in general.</p>
<p>Not because I’m hung over, or a wimp, or that I didn’t prepare (which I half-ass did, enough to feel fine going up on stage). But because something else needs to be talked
about. </p>
<p>Women. Women in tech. Women at conferences. I want to make a few points about my experience here at DjangoCon: I woke up this morning a little offended. There are about 250 attendees, about 8-10 of them are women (or at least selected female cut shirts), a ratio that is not unfamiliar, but quite low nonetheless. Those who reviewed the talk proposals for DjangoCon Europe selected 2 women (myself included) to give a talk, and invited 2 women to serve as keynote speakers. </p>
<p>I’m bothered not because of one particular thing happened. I’m peeved because of many little things that added up that are independently too little of an annoyance to address. For instance, I walked into the first day of the conference; as soon as it was my turn to check in, it was assumed that I was the only female name badge that was visible on the table. No big deal; there’s a high probability here that it would be me. I don’t blame them. </p>
<p>When I passively mentioned a couple of times that I’m giving a talk at the conference, people assumed I’m one of the two female keynotes for the same reason, high probability that a female speaking is one of the keynotes. I’ll be honest: it perturbs me a little that people approach me asking how to get women involved, maybe having their female counterparts in mind, wives, girlfriends, friends, daughters, etc. In my opinion, if that question is asked, it’s compounding the issue. </p>
<p>What should happen is not conversations about <em>how</em> to get a minority in an industry to reflect population distribution in a nation. On twitter, a friend from PyCon asked me and two other folks that are here at this conference whether he should use iPython and bpython. That was a personal win for me: it was pretty much the first time someone in the Python community solicited me for technical advice. </p>
<p>I passively get women involved by actively pursuing what I’m interested in: learning how to code. I host workshops & events for <a href="http://www.meetup.com/pyladiessf" title="PyLadies Meetup">PyLadies</a> and <a href="http://www.meetup.com/women-who-code-sf" title="Women Who Code meetup">Women Who Code</a>, bring women down to <a href="http://www.roguelynn.com/2012-03-22-o-hai-there" title="O HAI THERE!">developers' conferences</a> and local meetups with me, invite <a href="http://www.meetup.com/Women-Who-Code-SF/events/54634072/" title="Speaker Event">female speakers</a> for events, meet with friends individually to tutor in Python, lots of fun things. This indirectly creates an environment full of support and excitement, and it feels awesome.</p>
<p>So I’m telling you here: let’s get women involved by engaging them directly. It takes action, understanding, and doing.</p>
Gold Nuggets: My Bookmarks Collectionhttp://www.roguelynn.com/words/gold-nuggets-my-bookmarks-collection/2012-05-20T09:50:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I’ve collected many bookmarks related to Python, programming, computer science, etc. </p>
<p>It’s time I cleaned it out and organized it better. In the midst of doing this, I found some really awesome sites, libraries, tutorials, etc that need to be shared. I still have plenty of bookmarks that are great resources, but I didn’t want to inondate readers :). I’ve loosely divided them up by experience level. Here goes: </p>
<h4 id="beginner">Beginner:</h4>
<ol>
<li> <a href="http://sigpwned.com/content/learning-how-program" title="Learning How to Program">Advice</a> on learning how to program.</li>
<li> Model, View, Controller <a href="http://www.tomdalling.com/blog/software-design/model-view-controller-explained" title="MVC Explained">explained</a>.</li>
<li> Basic Python Syntax <a href="http://www.tutorialspoint.com/python/python_basic_syntax.htm" title="Basic Python Syntax">Tutorial</a>. (Helpful for both new toprogramming in general, and new to Python with prior programming experience).</li>
<li> Colorful Git <a href="http://rogerdudler.github.com/git-guide/" title="Git Guide">Tutorial</a>.</li>
<li> Git Tutorial based on <a href="http://openhatch.org/missions/git" title="Git Missions">missions to complete</a>.</li>
</ol>
<h4 id="advanced-beginnerintermediate">Advanced Beginner/Intermediate:</h4>
<ol>
<li> Git <a href="http://sitaramc.github.com/gcs/index.html" title="Git for Scientists">Tutorial</a>for Scientists (although, you don’t need to be ascientist to get the benefit of this tutorial).</li>
<li> Bash Shell <a href="http://dl.dropbox.com/u/397277/bash_shell_cheat_sheetV2.pdf" title="Bash Cheat Sheet">Cheat Sheet</a> (Linux/Mac). I’m in love with Bash.</li>
<li> Django tutorial:<a href="http://lightbird.net/dbe/blog.html" title="Django Simple Blog">A simple Blog</a> (uses an old version of Django).</li>
<li> <a href="http://bpython-interpreter.org/about/" title="bpython Interpreter">bpython</a>: an overlay to the Python interpreter. Features in-linesyntax highlighting, autocomplete suggestions, displays a list of parameters when calling a function, displays what’s inside of the libraries when you import. This is a lightweight IDE for the terminal/Python shell. Awesome!</li>
</ol>
<h4 id="experience-agnostic">Experience Agnostic:</h4>
<ol>
<li> Awesome little <a href="%20http://pypi.python.org/pypi/pep8" title="PEP-8 tool">Python library</a>: checks to see if your Python file is PEP-8 compliant.</li>
<li> <a href="http://jeetworks.org/gitcommandswidget" title="Git Widget">Git Widget/Cheat sheet</a> for Mac (lives on your dashboard for easy access).</li>
<li> <a href="http://www.quora.com/Python-programming-language-1/Which-Internet-companies-use-Python" title="Python Companies">List</a>of internet companies that use Python.</li>
</ol>
PyLadies First Workshop: Build your own Bloghttp://www.roguelynn.com/words/pyladies-first-workshop-build-your-own-blog/2012-05-16T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>What a blast I had on Saturday! It was a great learning experience for both the attendees and for us PyLadies org board! </p>
<p>Good lessons learned, such as virtualenv is a beast to install depending on one’s operating system. Next time, we’ll have a pre-workshop session for installation of dependencies and such. We went through how to build a blog using Django and its class-based views. We also went over Model-View-Controller concepts, and how they apply to Django. We wrapped up with a quick overview of how easy it is to deploy an application on Heroku (way too easy!). </p>
<p>I’ve made a <a href="https://github.com/econchick/PyLadiesBYOBlog" title="PyLadies Build your own Blog">GitHub repo</a> for people to fork & contribute. I didn’t find an easily accessible tutorial for building a blog, or for building a simple app using Django’s class views, so hopefully this will grow to something useful in the Django community. Til next time!</p>
Update: PyLadiesSF Debuthttp://www.roguelynn.com/words/update-pyladiessf-debut/2012-04-30T08:53:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Wow, what an event it was! I’d say we had about ~50 attendees for our <a href="http://www.meetup.com/pyladiessf" title="Event Page">first event</a>, which is awesome. There was so much excitement in the room that I couldn’t even get a chance to hack on my projects! </p>
<p><a href="http://klout.com" title="Klout">Klout</a> had a great space for us, and <a href="http://linkedin.com" title="LinkedIn">LinkedIn</a> provided [too much] food. And a huge thanks to <a href="http://www.robynnavarro.net/" title="Robyn Navarro">Robyn Navarro</a> for taking time out of her schedule to come take some <a href="http://www.meetup.com/PyLadiesSF/photos/7810542/#" title="Event Pictures">awesome pictures</a>of the event! </p>
<p>I’ve seen some very positive reviews on the meetup site, and I hope this group continues to grow with such energy. Many women really appreciated the space and time to either learn Python or to hack on projects with new friends. I even had the pleasure of meeting folks from the <a href="http://www.udacity.com" title="Udacity">Udacity</a> team; I think it’s such a great thing they’re doing: free online classes, specifically CS classes taught with Python. We scored some free t-shirts too! </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/update-pyladiessf-debut/2012_04_newbie_corner.jpg" title="Newbie corner @ PyLadiesSF debut" alt="Newbie corner @ PyLadiesSF debut" /></p>
<p>There was about 2/3-ish women with their +1’s. I was really surprised how a few folks stepped up and helped out folks learning Python and Django. For one of the newbie corners, we worked through Learn Python the Hard Way, where people earned stickers for each exercise they completed (cute, huh?). Same theme with learning Django. Who doesn’t like earning gold stars?! PyLadiesSF is hoping to host an event or two in May, including an intro to Python & Django workshop, and perhaps a Speaker Series. Stay tuned! </p>
Debut of PyLadiesSFhttp://www.roguelynn.com/words/debut-of-pyladiessf/2012-04-25T08:54:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<h4 id="before-you-read-on">Before you read on: </h4>
<p>We have our first event this <a href="http://www.meetup.com/PyLadiesSF/events/59189602/" title="Debut Event">Saturday, April 28th</a> from 3p-8pm at <a href="http://klout.com/home" title="Klout">Klout</a> with an awesome dinner provided by <a href="http://www.linkedin.com" title="LinkedIn">LinkedIn</a>. EXCITING!!! </p>
<hr>
<p>I am proud to present the founding of PyLadiesSF, the San Francisco/Bay Area chapter of <a href="http://pyladies.com" title="PyLadies">PyLadies</a>, a global mentorship group for women in the Python community. Our mission is to promote, educate, and advance women in the field of engineering who love (or loathe…) Python. </p>
<p>We will have monthly events, ranging from all group hack nights to various experience level bootcamps to weekly study groups. We have folks who want to learn programming, who use Python as a hobby, or develop in Python. Ladies coming from a different language, a scientific & academic background, or making a career transition. It’s so fantastic to see these women so excited about learning and about making PyLadiesSF and every PyLady succeed. </p>
<p>I’ve had many folks, women and men, approach me asking how they can help out. What surprises me the most is someone hasn’t done this before; why no SF chapter until now? We’re Silicon Valley! But I am so excited to be a part of this, to bring the Python community together in a big wave of support for women. I found that folks in the Python community, in the CS/engineering community as a whole, recognizes that there is an imbalance of gender. </p>
<p>It is partly a great thing because it is very easy to find resources to help women learn and grow. But it’s also unfortunate because no one puts an active effort towards advancing women. The ladies I’ve met and worked with are extremely enthusiastic about learning. It’s certainly difficult to pursue such an endeavor; coding is hard! But it’s a lot easier to ask those “stupid”, naive questions in a non-competitive environment. </p>
<p>Kaitlyn Trigger of Lovestagram had a <a href="http://www.slideshare.net/katydint84/the-story-of-lovestagram" title="Kaitlyn Trigger Lovestagram Slides">great metaphor for programming</a>:</p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/pyladiessf-debut/building_furniture.png" title="Directions for Building Furniture" alt="Directions for Building Furniture"/></p>
<p>How in the h&ll do you get from step 1 to step 2? Either a magic wand, or you are a carpenter by nature. Similar with engineering: talk to development folks, ask for a product and it magically appears all constructed. I had a hard time grasping how a for loop, classes, and methods can turn into an architecture for a cloud fileshare system, or a social networking site, or even a simple greedy change calculator. I found that having someone that also comes from this “what is this wizardry?!” point of view helps break down the steps to actually create a meaningful program. Enter: ME! I invite anyone who has an interest in Python to <a href="http://www.meetup.com/pyladiessf" title="PyLadiesSF Meetup">come check us out</a>.</p>
O HAI THERE!http://www.roguelynn.com/words/o-hai-there/2012-03-22T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Who knew I’d get wrapped into Silicon Valley?</p>
<p>I should post more so I don’t have to do such a digest of what’s been going on! :) It all started when I took my first CS class. Well, it sucked. Learning first in C, I thought, “Well, I know some of these words.” I [re]assure you, C does not come naturally to a (former) financial analyst. I felt
I was drowning: I failed both exams.</p>
<p>I had the pleasure of attending Science Hack Day, where I was first exposed to Python. Then it was all downhill from there… My final project was <a href="http://inflatr.com" title="InflatR">inflatr.com</a>, which used the awesomeness of Python and my undying interest in economics. Python is actually an interesting language; it poses as an easy-to-learn language for beginner coders, but it more so leverages the coder to build programs and scripts efficiently by taking away a lot of the tediousness that other languages are known for and where beginners drown.</p>
<p>I then had the audacity to start up a study group for women wanting to learn Python. Standing on the shoulders of the newly founded <a href="http://www.meetup.com/women-who-code-sf" title="Women Who Code meetup">Women Who Code</a> group (growing at a linear rate of 100 members a month without skipping a beat!). Never had I organized a group like this before, but let me tell you: it was so easy, it begged to be done. Within a few days of conceiving of the idea, I had gotten the ‘okay’ from Women Who Code, found space at Dropbox who generously agreed to hosting the group for 8 weeks <em>and</em> join them for dinner every meetup, and secured Guido van Rossum, the Benevolent Dictator of Life at Google, aka the founder and creator of Python, to kick off the first event. The first meetup guest list filled up within hours.</p>
<p>By the time the study series got into its grove, I received sponsorship from PyLadies to attend the mecca of all Python users - PyCon, down in Santa Clara. On top of that, I received a grant that included two free PyCon registrations (normally ~$300) as well as $100 off registration, and $300 to rent a car to bring these women from the SF area to PyCon. Lastly, I scored a FOSS Sponsorship booth that allowed Women Who Code to represent and promote the group at the conversation. Seeing as how we didn’t have any of the typical ‘marketing materials’ that one would need at a conference, I asked for and instantly received another grant for $200 to get a banner, a poster, some stickers and 5000 business cards all with Women Who Code’s name and mission. It was one of the best experiences ever, both attending PyCon and working with the women that attended my study group. My name and actions within this community are known before I arrive anywhere now. There is so much energy and crave to learn Python in a safe environment that I just can’t stop now.</p>
<p>PyLadies approached me a couple months ago to start a San Francisco chapter, and now I’m going full force! So now my current projects are to get PyLadies SF going, to clean up and implement new applications to my inflation site, to continue coding in Python and increase my algorithms education, and to continue being awesome. :-) Stay tuned: this weekend I’ll produce a postmortem on my recently passed laptop.</p>
It's for SCIENCEhttp://www.roguelynn.com/words/its-for-science/2011-11-14T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Guten tag, all. I just had an awesome weekend. I attended <a href="http://sciencehackday.pbworks.com/w/page/45740104/SFideas" title="SF Science Hack Day">San Francisco Science Hack Day</a> and it was <em>awesome</em>! </p>
<p>It was two <strong>full days</strong>, so let me lay it out for you: <strong>about me:</strong>programming & science n00b <strong>mission:</strong>to make & hack things with SCIENCE <strong>vision:</strong>to hack data from Switzerland’s Large Hadron Collider for better accessibility and understanding, esp for enthusiasts but n00bs like me. <strong>results:</strong></p>
<ul>
<li> my knowledge of particle physics went from 0 to … some? probably 1
if it’s on a scale of …100.</li>
<li> crash course in python and javascript programming</li>
<li> +5lbs from bountiful free food and alcohol</li>
<li> Increased tolerance to caffeine</li>
<li> I made <a href="http://www.mattbellis.com/index.php?title=LHC_Hack_Day/Mass_distributions" title="Mass Distributions Viz">THIS</a>!</li>
<li> The group collectively put <a href="http://www.mattbellis.com/index.php?title=LHC_Data_Hack" title="LHC Data Hack">THIS</a> website together!</li>
<li> 10 GBs of LHC data, out of the gazillion terabytes of data both available & unavailable to the “public” (I say with quotes because not exactly sure where I could even get the publicly released data…)</li>
</ul>
<p>Note: If you’re interested in the source code, it can be found on GitHub under mattbellis. So, what those two graphs mean that I put together is [I THINK, IF I REMEMBER CORRECTLY] a reflection of two different ways to calculate mass of muons. Muons produce when two protons collide together, along with other particles, including mesons. The second graph (using Einstein’s equations/special relativity), you can see a second “hump”, signaling that there actually is a second meson existing that the first graph didn’t show. All ‘n all, thanks to my intro course to computer science, I was able to have somewhat of an understanding of what was going on, as well as built upon my skills (especially for my final project coming up!). All I have to say for those n00bs to any new skill: Dive right in. That’s the best, most fun way to learn! </p>
Studying BitCoinhttp://www.roguelynn.com/words/studying-bitcoin/2011-07-07T12:30:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>Has anyone else heard of BitCoin, and the incredible inflation it has seen? </p>
<p>This <a href="http://falkvinge.net/2011/05/29/why-im-putting-all-my-savings-into-bitcoin/" title="Why I'm Putting All my Savings into Bitcoin">dude</a> has put <strong>all</strong>his money into BitCoin - not smart, I would say. Especially when <a href="http://www.dailytech.com/article.aspx?newsid=21877" title="Run on Bitcoin">news</a> like a run on BitCoin hits. (If you have no clue on what BitCoin is, <a href="http://en.wikipedia.org/wiki/Bitcoin" title="Wiki: Bitcoin">read</a>.) So while I laugh at the guy who invested all his money that is now gone, I reflect back on a conversation I had with a friend a few years ago. Can we use this virtual currency to study monetary economics? BitCoin has the potential to provide us with an interesting avenue to understand another type of currency. But what I am imagining more is looking at virtual worlds and video games (disclaimer: I have no direct experience with video games) to build scenarios to “test” on live people interacting with the world/game. Currently, there are trading simulators that follow the market where one can see how his/her virtual portfolio would fair in real life. Add that aspect to a virtual economy and see how quantitative easing programs could affect it. Just an idea.</p>
Seattle Tech Talkhttp://www.roguelynn.com/words/seattle-tech-talk/2011-06-30T08:58:00ZLynn Rootlynn[at]lynnroot[dot]comhttp://www.roguelynn.com/<p>I had the interesting experience of attending a Seattle tech talk with Facebook’s Mark Zuckerberg. Unfortunately I was too scared to ask him if he felt $70 billion was a fair value for the FB IPO estimate, or if he needed any analysts to help him justify such a number.</p>
<p>But what I though was interesting that he said, beyond all that geeky coding and infrastructure talk, was that he believes the rate of people “sharing” stuff via Facebook will double each year, so within 5 years people will be sharing 32x as many things compared to today.</p>
<p>Is that even possible? ~1024 things/day being shared/liked/commented. We’d have to be FBing nearly every minute.</p>
<p>What I think is more accurate is an exponential growth of a person sharing, then as one gets older, the activity tapers, slows down, and maybe even fall to zero/very rare. What would be important would be to look at that age where exponential growth happens and see how that is growing.</p>
<p>Just quick thoughts post-talk. I enjoyed it, plus the free pizza & booze. Although I wish I could have gotten some free swag… </p>
<p><img class="displayed" src="http://www.roguelynn.com/assets/images/seattle-tech-talk/mark_zuckerberg.jpg" title="Mark Zuckerberg @ Seattle Tech Talk" alt="Mark Zuckerberg @ Seattle Tech Talk"/></p>