Home > Life > The Economist, and, The Kindle: Take 2

The Economist, and, The Kindle: Take 2

January 3rd, 2010

A while ago I had written about how you can get The Economist on your Kindle (and other e-readers) by running a simple PHP script that crawls the economist.com and generates a .mobi file that it emails it to your Kindle weekly. Unfortunately (though understandably), around July 2009 they locked out their This Week’s Print Edition website to only subscribers of their online and print editions.

With a little bit of work, I’ve updated the economist_to_kindle.php PHP script to handle logging into the economist.com’s website with your user-name and password so it can generate a Kindle version again:

economist_to_kindle.phps

With this update, and if you’re a print edition subscriber, you should be able to get this week’s edition on your Kindle again.

Updated 2010/01/25: Several bugfixes, see comments for details.

Updated 2010/07/22: slifox and crosscode have made some great additions to the code and got it working with the Economist.com’s latest site structure. Check out crosscode’s latest version or read the comments for details.

  1. Maarten
    January 12th, 2010 at 00:06 | #1

    Hey Nic,
    Thanks for the nice work! I’ve been trying to add proxy authentication, but when I do, i lose the site authentcation.. Any ideas?

    I added
    //set proxy stuff
    curl_setopt($ch,CURLOPT_PROXY,”webproxy:8080″);
    curl_setopt($ch,CURLOPT_PROXYUSERPWD,”userdomain\\username:password”);

    twice (in function economistLogin() and function economistGetUrl($url))
    Thanks for your feedback!

  2. Peter
    January 14th, 2010 at 19:02 | #2

    You’re the man, Nic… Thanks.

  3. H Y Lee
    January 15th, 2010 at 17:17 | #3

    I am afraid can’t get this to run properly. I get an error, “PHP Fatal error: Call to undefined function curl_init ….. on line 575″ and the script then stops and doesnt’ do anything. I am using php 5.2.8. I am running on Windows XP.

    Could you also tell me how to set the basedir to store the files on the directory C:\Documents and Settings\LHY\My Documents\Economist? Thank you.

  4. January 15th, 2010 at 21:30 | #4

    @Maarten
    I added the CURLOPT_PROXY right before curl_exec() and it seemed to work for my local proxy. I do not have it password protected, so maybe the credentials are specified wrong? I would try printing a line for curl_error($ch) right after curl_exec() so you can see what’s wrong.

    Hopefully that’ll work!

  5. January 15th, 2010 at 21:32 | #5

    @H Y Lee
    H Y Lee — You’ll have to make sure the Curl libraries are installed on your Windows PHP install:
    http://curl.haxx.se/libcurl/php/install.html
    http://windows.php.net/

    You should be able to just update the basedir with the directory you’ve specified. That path should work on Windows. Since you’re using forward slashes (\), you’ll have to escape them:
    C:\\Documents and Settings\\LHY\\My Documents\\Economist

  6. Peter
    January 20th, 2010 at 12:52 | #6

    Hi Nic, again thanks for spearheading this. A few quick things.

    Some of your users (especially Linux/Mac) may want to know that kindlegen has more or less replaced mobigen_linux. A quick update of this line works (replace with kindlegen– all other options work the same, it seems.)
    system(“/usr/local/bin/mobigen_linux -c1 $opfFile”);

    I’m not sure what it is exactly, but it seems like many links don’t work off of mbp_toc.html. I’ve traced it back to the correct formatting in the economist.html file. When there’s a good , then it works, but half the time it’s just that gets printed out. Is there something whacked here… maybe a substitution or other that’s wiping out the id? Without it, I can’t really use the table of contents.

    Also, the NYT allows you to click the ‘trackball’ to the right, and jump to the next article. I imagine that’s done by inserting a tag in the html. Would be great to have that too.

    Thanks!

  7. Peter
    January 20th, 2010 at 12:53 | #7

    Sorry Nic… meant to write ‘when there’s a good id=2234234… then it works, otherwise the link from TOC fails.

  8. Another Peter
    January 24th, 2010 at 19:31 | #8

    I installed PHP 5.2.12 – and have not been able to get this to work. I have set my Economist login and password, as well as my kindle email address in the phps file (using Wordpad). However, when I run the phps file, the DOS box comes up for a fraction of a second, and disappears.
    Thoughts?

  9. Peter
    January 25th, 2010 at 16:58 | #9

    Nic – it’s not picking up the start and end of article for United States’ section. Would really appreciate if you could give this a look for a few minutes… Thanks,

    “Another Peter” … sounds like you’re not running it from the command line. My guess is that you have the working directory with insuffucient slashes “/” or something, or maybe you’re missing the curl piece? Either way, you shouldn’t just “double click” on the php file… Try to run it from within the command line window to begin with, and post the output here.

  10. January 25th, 2010 at 22:53 | #10

    Hey Peters,

    Thanks for finding this issue. I’ve updated the source with 3 fixes:

    1) As you found, many of the articles’ links didn’t work and the entire article was missing. This seemed to hit the “Americas” section a lot. I found that if you go to the URL linked from the This Week’s Edition page, the server 301 redirects to a second URL. CURL, which downloads the articles, wasn’t setup to follow this redirect, so those links were missing.

    2) Starting with this week, 01/23/2010, the HTML format of many articles changed, which broke the converter (there was an extra H1 in the HTML, which we thought was the title). This caused extraneous SCRIPT tags to be in the HTML, which caused mobigen to break completely. This should be fixed.

    3) I’ve updated the file to make sure all “See article” links work (for example, in the Politics This Week section). Before, you would just see the text “See article”. Now, it should link to the actual article.

    Let me know of any problems.

    http://nicj.net/files/economist_to_kindle.phps

  11. Peter
    February 6th, 2010 at 17:36 | #11

    Hi Nic, It’s working great now, thanks! BTW, do you know how many papers get the “View Sections List/ Next Article/Previous Article” to work on the kindle? I think it might be an embedded script tag of some sort, though it’s interesting that that part of the screen (always the very bottom) doesn’t “flash” on page-changes. Maybe it’s more of a kindle thing???

  12. February 6th, 2010 at 19:48 | #12

    Hi Peter — I don’t know how they do that, but I will look into it! Could be a kindle-specific thing, I wonder if the MOBI format natively supports that.

  13. Josh
    March 27th, 2010 at 10:46 | #13

    Just letting others know since I had to get libcurl working on debian (should work for ubuntu as well) sudo apt-get install php5-curl
    then it started working for me.

    Unfortunately, I’m having trouble with back articles. Parser seems thrown by the article subscription links (I’m an online only subscriber). Seems like it would be easy to fix, but the php is throwing me. Halfway tempted to rewrite in ruby, but I’m sure that it’s not worth it, since the script works great for current artciles.

  14. Josh
    April 4th, 2010 at 06:56 | #14

    Got it working. Style must be different for online only subscribers for back issues. I just had to update where it found the end of the body. Patch contents are:

    397c397
    $endPos = getArticleEnd($article);
    414c414
    $endPos = getArticleEnd($article);
    520c520
    function getArticleEnd($article)
    539,543d538
    < $end4 = strpos($article, '<script', $startPos);
    < if ($end4) {
    < $ends[] = $end4;
    < }
    <

  15. April 7th, 2010 at 20:54 | #15

    Thanks Josh! If you don’t mind, I’ve added your update to the script on this site so others will have the fix.

  16. Peter
    June 14th, 2010 at 13:42 | #16

    Hi Nic,
    Looks like the Economist has changed the URL structure, now using ./node/SID where SID is the ’story ID’. Is something up with curl, that’s not following through correctly? I’m on a new system, so maybe this whole thing is a config issue on my end, but it seems the script has suddenly stopped working in the last week or so.
    Thanks, -Peter

  17. slifox
    June 27th, 2010 at 16:32 | #17

    Hi all,

    It looks like The Economist website has changed again. I’ve made some modifications to the script so it works with the latest print issue from the website.

    This is the first time I’m using this script, so I didn’t have the old HTML files for comparison. I also don’t have access to the online-only subscription, and haven’t tested any back-issues. Therefore you might have problems using my modified script for these things — however it works fine for the latest print issue available from the website.

    I changed the login to use HTTPS instead of HTTP, so your password is sent encrypted (hopefully — I didn’t veryify this). I also added a few more options, including an option to disable Email sending, and an option to enable creating a link to the latest issue. The link creation allows you to have a single URL on your webserver which will always provide you the latest issue (which you can then download using your Kindle webbrowser).

    Here is the script:
    https://www.revlogic.net/public/economist_to_kindle.phps
    (hit “Accept” / “Allow” / “Add Exception” / etc.. if you get a warning about SSL certificates while downloading the script)

    You’ll also need mobigen_linux, which can be downloaded from here:
    http://nicj.net/files/mobigen_linux.tar.gz
    Or alternatively you can use kindlegen instead of mobigen_linux (though I haven’t tried):
    http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621

    Feel free to send me any comments or provide fixes for any broken features that I didn’t test.

    -slifox

  18. crosscode
    July 4th, 2010 at 10:02 | #18

    Here’s my modifications of the script, based on the changes slifox made.

    Changes
    * better handling of images (afaik it doesn’t miss any)
    * it’ll optionally read its configuration from economist_to_kindle.config.php
    * print the name of the articles as it downloads them (mostly for debugging)
    * optionally tell the kindle to treat the article as a magazine (since not all the kindle magazine features are supported thanks to a complete lack of documentation, some people may not like this option much)
    * downloads the issue cover (although it can be quite difficult to get the kindle to show it to you, and it’s VERY low resolution)
    * verified to work with back issues, although issues before June of 2010 use special compatibility code :( (not a problem for using the script, just a problem when editing it)

    Known Bugs
    * any kindle menu with “section” in the name (fex. “Go to Section…”) will cause the kindle to close the book with an errork. There’s no permanent damage from this.
    * I’ve only tested this with kindlegen 1.1 on linux and kindle software v2.5.2. YMMV.

    Download:
    http://www.crosscode.org/public/economist_to_kindle.phps

    Sample economist_to_kindle.config.php:

  19. July 16th, 2010 at 05:42 | #19

    Thank you so much for this script!! But does anyone know how to get this working with Windows 7/Kindlegen?

  20. July 16th, 2010 at 05:57 | #20

    Rather, I changed the information to use kindlegen in a random Win directory, but even though my Economist login information is correct, the program still returns the login failed error message — did they update something?

  21. July 22nd, 2010 at 19:21 | #21

    Great changes slifox and crosscode! I’ll link to crosscode’s latest version from the main article so everyone can get the latest version.

  22. Gerardo
    July 23rd, 2010 at 13:01 | #22

    Does the newest script work with Windows and/or Kindlegen? If so, how?

  1. No trackbacks yet.