{"id":93,"date":"2009-06-26T12:00:57","date_gmt":"2009-06-26T10:00:57","guid":{"rendered":"http:\/\/www.angeredbrackets.com\/?p=93"},"modified":"2009-06-26T19:55:18","modified_gmt":"2009-06-26T17:55:18","slug":"xkcd-knockout-comic-downloader","status":"publish","type":"post","link":"https:\/\/www.angeredbrackets.com\/wordpress\/2009\/06\/xkcd-knockout-comic-downloader\/","title":{"rendered":"XKCD Knockout Comic Downloader"},"content":{"rendered":"<p><a href=\"http:\/\/xkcd.com\">XKCD<\/a>, for those of you who don&#8217;t know, is a webcomic about (as it testifies of itself) romance, sarcasm, math, and language. In my opinion, it&#8217;s the best webcomic out there. I wanted to download the complete comics archive, for the sake of a local backup, and as an idea for a printed, coffee-table style book. However, I didn&#8217;t want just the comic&#8217;s image file &#8211; the best part of XKCD is often the alt-text that shows up as a tooltip when you mouse-over the comic. And of course there&#8217;s the comic&#8217;s title, as well.<\/p>\n<p>There are several scripts out there that others have written before me that download the comic, and some even do a pretty good job of getting some of the extra data. However, I wanted more. I wanted a downloader that would get ALL the data available about the comics, and store it in an easily-retrievable, transformative manner.<!--more--><\/p>\n<p>This led me to write my own downloader, which I have dubbed the XKCD Knockout Comic Downloader (or XKCD, for short). I have relied to some degree on those who have come before me, and have modified their code.<br \/>\nOff the top of my head, I used some code by <a href=\"http:\/\/blog.johnlawrence.net\/2008\/10\/yet-another-xkcd-download-script\/\">John Lawrence<\/a> to find the latest comic number; and this discussion on <a href=\"http:\/\/ubuntuforums.org\/showthread.php?t=867649\">Ubuntu forums<\/a> got me started with getting the meta-data.<\/p>\n<p>My downloader the most complete I&#8217;ve seen so far. of it&#8217;s notable features:<\/p>\n<ul>\n<li>Store meta&#8211;data, including path to image, in XML file.<\/li>\n<li>Choice of downloading the images or not.<\/li>\n<li>Can append to an existing XML file, and update it since you last downloaded your personal batch of XKCD.<\/li>\n<li>Will store all data about the comic, including seldom or never before used attributes, such as href, src, etc. (more on that later.)<\/li>\n<\/ul>\n<p>Sure, this script isn&#8217;t a nifty one-liner that does all the work, but instead, it does more work, and does it well:<\/p>\n<p><code>#!\/bin\/sh<br\/><br\/>#-----user configurable-----<br\/>append_to_file=true # continue from previous download<br\/>download_path=~\/xkcd\/<br\/>image_path=images<br\/>xmlfile=xkcd.xml<br\/>download_images=true<br\/>#---------------------------<br\/>#------configuration--------<br\/>i=1<br\/>latest=`wget -q -O - http:\/\/www.xkcd.com | grep 'link to this comic' | sed 's\/.*xkcd.com.\\([^\\\/]*\\).*\/\\1\/'` <br\/>#---------------------------<br\/><br\/>if [ ! -d $download_path ]<br\/>then<br\/>&nbsp;&nbsp;&nbsp;&nbsp;mkdir $download_path<br\/>fi<br\/>cd $download_path<br\/><br\/>if $download_images &#038;& [ ! -d $image_path ]<br\/>then<br\/>&nbsp;&nbsp;&nbsp;&nbsp;mkdir $image_path<br\/>fi<br\/><br\/>if $append_to_file &#038;& [ -f $xmlfile ]<br\/>then<br\/>&nbsp;&nbsp;&nbsp;&nbsp;sed -i '\/\\\/xkcd\/ d' $xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;i=$(tail -8 $xmlfile | grep '&lt;id>' | sed 's\/^.*>\\([0-9]\\+\\).*\/\\1\/')<br\/>&nbsp;&nbsp;&nbsp;&nbsp;i=`expr $i + 1`<br\/>else<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"\ufeff&lt;?xml version=\\\"1.0\\\" encoding=\\\"ISO-8859-1\\\"?>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"&lt;?xml-stylesheet type=\\\"text\/xsl\\\" href=\\\"xkcd.xsl\\\"?>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"&lt;xkcd>\">>$xmlfile<br\/>fi<br\/><br\/>while [ $i -le $latest ]<br\/>do<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"&nbsp;&nbsp;&nbsp;&nbsp;&lt;comic>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;id>$i&lt;\/id>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;wget http:\/\/xkcd.com\/$i\/<br\/>&nbsp;&nbsp;&nbsp;&nbsp;img=$(grep http:\/\/imgs.xkcd.com\/comics\/ index.html | head -1)<br\/>&nbsp;&nbsp;&nbsp;&nbsp;params=$(($(echo $img | tr -dc '\"' | wc -c)\/2))<br\/>&nbsp;&nbsp;&nbsp;&nbsp;for ((j = 1; j &lt;= $params; j++))<br\/>&nbsp;&nbsp;&nbsp;&nbsp;do<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;param=$(echo $img | cut -d\\\" -f$(($j*2-1)) | sed 's\/>*&lt;*[a-z]*\\ \\([a-z]*\\)\\=\/\\1\/')<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;val=$(echo $img | cut -d\\\" -f$(($j*2)))<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;echo \"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;$param>$val&lt;\/$param>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if [ $param = src ]<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;then<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;filename=$(echo $val | cut -d\\\/ -f5)<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if $download_images<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;then<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;wget $val<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mv $filename \"$image_path\"\/\"$i\"_\"$filename\"<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fi<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;echo \"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;filename>$i\"_\"$filename&lt;\/filename>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fi<br\/>&nbsp;&nbsp;&nbsp;&nbsp;done<br\/>&nbsp;&nbsp;&nbsp;&nbsp;echo \"&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/comic>\">>$xmlfile<br\/>&nbsp;&nbsp;&nbsp;&nbsp;rm index.html<br\/>&nbsp;&nbsp;&nbsp;&nbsp;i=`expr $i + 1`<br\/>done<br\/>echo \"&lt;\/xkcd>\">>$xmlfile<\/code><\/p>\n<p>Rather than explain all the regexs used, and the logic in the script, if you have any specific questions, please ask.<br \/>\nWhat distinguishes this script from others (of the rare few that download the meta-data) is that I&#8217;m not assuming any attributes exist, but am downloading all of them. This is useful for the irregular comics such as <a href=\"http:\/\/xkcd.com\/472\">House of Pancakes<\/a> or <a href=\"http:\/\/xkcd.com\/191\">Lojban<\/a>. In fact, I just ran a search for href in the XML file I created today, and found a few nuggets I&#8217;ve missed in the past.<\/p>\n<p>The fact the data is stored in an XML file means it&#8217;s transformative. write a script to tweet the alt-text (why you would do that is beyond me), or create a tag cloud of frequently used words. My intention is to create an XSTL file that will display all the comics in a pleasing manner and bring it to print. (Dealing with Randal&#8217;s irregular image sizes is something I&#8217;m still working on, and am open to suggestions).<\/p>\n<p>Let me know if you use this script and what creative ideas you have in mind for your stash of XKCD.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>XKCD, for those of you who don&#8217;t know, is a webcomic about (as it testifies of itself) romance, sarcasm, math, and language. In my opinion, it&#8217;s the best webcomic out there. I wanted to download&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[12,22],"tags":[29,30,6,28,27],"_links":{"self":[{"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/posts\/93"}],"collection":[{"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/comments?post=93"}],"version-history":[{"count":26,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/posts\/93\/revisions"}],"predecessor-version":[{"id":120,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/posts\/93\/revisions\/120"}],"wp:attachment":[{"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/media?parent=93"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/categories?post=93"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.angeredbrackets.com\/wordpress\/wp-json\/wp\/v2\/tags?post=93"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}