Getting Bookmark Data from del.icio.us

Why am I doing this? Well, I've been bookmarking links using del.icio.us since March 2009 and have, very slowly, built up a steady collection of over 900 bookmarks, most of which are probably still quite useful.

The ups and downs of social bookmarking service del.icio.us are very well documented here, so I won't go into all the details, but suffice to say that reading the aforementioned article gave me all the impetus I needed to go ahead and homestead my list of bookmarks. Simple, I thought.

Then came the snag...

We're sorry, but due to heavy load on our database we are no longer able to offer an export function. Our engineers are working on this and we will restore it as soon as possible.

Great... what now? Time to write a script of course. I've initially created this article using a Jupyter Notebook which may, I hope, explain the recipe feel to the writing.

The Plan

Well, initially the plan was just to grab the bookmarks by scraping my del.icio.us pages from beginning to end and turn that data straight into some form of bookmark HTML. Thinking about it more, I decided to opt for a JSON data format; that way I could easily turn the data to any format I like at my leisure.

View comments.

more ...

Show Source Plugin Update

I received some good news this week: my Show Source plugin, the one inspired by Sphinx, has had its pull request accepted into the getpelican/pelican-plugins repo. This means that the plugin becomes a part of the standard Pelican plugin canon.

My next task will be to make a small addition, and hence pull request, to the developers of the theme I use with this site (pelican-bootstrap3) to accomodate a couple of small template changes to support Show Source automatically.

The officially accepted version of show source is available now right here.

Enjoy!

View comments.

more ...

Creating AWS Data Pipelines with Boto3 and JSON

I have been doing a little work with AWS data pipeline recently for undertaking ETL tasks at work. AWS data pipeline handles data driven workflows called pipelines. The data pipelines take care of scheduling, data depenedencies, data sources and destinations in a nicely managed workflow. My tasks take batch datasets from SQL databases, processing and loading those datasets into S3 buckets, then import into a Redshift reporting database.

Seeing that production database structures are frequently updated, those changes need to be reflected in the reporting backend service. Now for a couple of years I have struggled on with Amazon's web based data pipeline architect to manage those changes to the pipeline. This has been an onerous task, as the architect does not really lend itself very well to the task of managing a large set of pipelines. Here begins a little tale of delving into the AWS data pipeline API to find another way.

View comments.

more ...

The Show Source Pelican Plugin

I have always been a fan of the Sphinx Python documentation generator and it has, I think, a nice feature where you can check out the raw source of a generated piece of documentation - a Show Source link. I decided that developing a Pelican plugin to imitate that feature would be a great way of getting a little deeper into the Pelican code itself.

This second post of the series explains the use of the Show Source plugin that I developed from the learning in the previous post. The following article has been, in some part, reproduced in the ReadMe.rst file included as part of the plugin.

View comments.

more ...

Getting Data from a Garmin FR610

Since getting a Garmin Forerunner 610 I have been largely satisfied with the various aspects of getting the data from the said watch on to the Internet. It is a multi stage process, each with risks and issues very much dependent on non functional qualities, such as network and USB resilience, usability and so on. To give an example, I get the data from my watch using a USB ANT+ stick, Mac computer, Garmin's Express software, Garmin Connect online service and finally, Strava via Strava's linked account capability. That's a lot of steps, and a lot of potential failures lying in wait. I am not going to get into my rather low estimation of both Garmin's Connect service and Express software; needless to say, I think they are both rather substandard at the moment.

View comments.

more ...

Moving from Map My Run to Strava

I have been running a little while now and am always in the habit of taking my phone with me to get some monitoring in on my progress with the aid of GPS and so on. I have been using a really nice app called mapmyrun. It has given me months of good service and what with the moderate amount of running I have done, I have committed a fair few hours worth of data online.

Meanwhile I have decided that it would be great to get out running with a group of people rather than on my own all the time, so I have joined up with a very friendly local running group in Caistor, Lincolnshire. Quite a number of them use a different app called Strava, and I thought that I would give it a go based on some positive recommendations from members of the group.

You might wonder what this article is doing in the software section of this site... well, read on.

View comments.

more ...

My Pelican Publishing Process

This brief article focuses on only one part of using Pelican as a static site generation tool - the day to day usage once the set up is established. There are loads of really good articles on how to get started with Pelican and I am not about to add to that growing pile; there is just no need.

I don't use the default make tool that comes with Pelican, because having a Python programming background made me feel much more comfortable with utilising the ever excellent Fabric tool.

View comments.

more ...

Titlecase: A Great Find

As a rather inexperienced writer, it is the little things I struggle with, as well as the big things you might add. I find that one of the most annoying is getting title case correct. There are a lot of style guides about which may or may not help; I find some of them cryptic at times, and rather than being strict rules, they seem to be more matters of style. Until recently I had been taking style advice from yourdictionary.com's site, but I always found having to undertake this task manually a bit of a bore.

However, as with most programmers (or even ex-programmers like myself), I find that as soon as I am faced with a repetetive task I look to find a way of automating it; and thus it happened with this. Five minutes later and I had turned up Pat Pannuto's Python based titlecase package.

View comments.

more ...

Pylint and the Module hashlib has no md5 member Error

How to avoid PyLint's Module 'hashlib' has no 'md5' member error when using hashlib's md5() function in your code, and the consequences of doing so.

I do not use md5 much at all these days, as it is known to be fairly unsafe; having said that, md5 does still have its uses when it comes to comparing file likeness in non-critical applications. It just so happens that PyLint does not seem to like invoking hashlib with specific constructors and will therefore mark down code ratings as a result. See the example below:

>>> HASH = hashlib.md5() >>> HASH.update('Test my hash') >>> HASH.hexdigest() 'f94b71b31c1d09d352db8b59d4f98892'

View comments.

more ...