Home > Life > The Economist, and, The Kindle: Take 2

The Economist, and, The Kindle: Take 2

January 3rd, 2010

A while ago I had written about how you can get The Economist on your Kindle (and other e-readers) by running a simple PHP script that crawls the economist.com and generates a .mobi file that it emails it to your Kindle weekly. Unfortunately (though understandably), around July 2009 they locked out their This Week’s Print Edition website to only subscribers of their online and print editions.

With a little bit of work, I’ve updated the economist_to_kindle.php PHP script to handle logging into the economist.com’s website with your user-name and password so it can generate a Kindle version again:

economist_to_kindle.phps

With this update, and if you’re a print edition subscriber, you should be able to get this week’s edition on your Kindle again.

Updated 2010/01/25: Several bugfixes, see comments for details.

  1. Maarten
    January 12th, 2010 at 00:06 | #1

    Hey Nic,
    Thanks for the nice work! I’ve been trying to add proxy authentication, but when I do, i lose the site authentcation.. Any ideas?

    I added
    //set proxy stuff
    curl_setopt($ch,CURLOPT_PROXY,”webproxy:8080″);
    curl_setopt($ch,CURLOPT_PROXYUSERPWD,”userdomain\\username:password”);

    twice (in function economistLogin() and function economistGetUrl($url))
    Thanks for your feedback!

  2. Peter
    January 14th, 2010 at 19:02 | #2

    You’re the man, Nic… Thanks.

  3. H Y Lee
    January 15th, 2010 at 17:17 | #3

    I am afraid can’t get this to run properly. I get an error, “PHP Fatal error: Call to undefined function curl_init ….. on line 575″ and the script then stops and doesnt’ do anything. I am using php 5.2.8. I am running on Windows XP.

    Could you also tell me how to set the basedir to store the files on the directory C:\Documents and Settings\LHY\My Documents\Economist? Thank you.

  4. January 15th, 2010 at 21:30 | #4

    @Maarten
    I added the CURLOPT_PROXY right before curl_exec() and it seemed to work for my local proxy. I do not have it password protected, so maybe the credentials are specified wrong? I would try printing a line for curl_error($ch) right after curl_exec() so you can see what’s wrong.

    Hopefully that’ll work!

  5. January 15th, 2010 at 21:32 | #5

    @H Y Lee
    H Y Lee — You’ll have to make sure the Curl libraries are installed on your Windows PHP install:
    http://curl.haxx.se/libcurl/php/install.html
    http://windows.php.net/

    You should be able to just update the basedir with the directory you’ve specified. That path should work on Windows. Since you’re using forward slashes (\), you’ll have to escape them:
    C:\\Documents and Settings\\LHY\\My Documents\\Economist

  6. Peter
    January 20th, 2010 at 12:52 | #6

    Hi Nic, again thanks for spearheading this. A few quick things.

    Some of your users (especially Linux/Mac) may want to know that kindlegen has more or less replaced mobigen_linux. A quick update of this line works (replace with kindlegen– all other options work the same, it seems.)
    system(“/usr/local/bin/mobigen_linux -c1 $opfFile”);

    I’m not sure what it is exactly, but it seems like many links don’t work off of mbp_toc.html. I’ve traced it back to the correct formatting in the economist.html file. When there’s a good , then it works, but half the time it’s just that gets printed out. Is there something whacked here… maybe a substitution or other that’s wiping out the id? Without it, I can’t really use the table of contents.

    Also, the NYT allows you to click the ‘trackball’ to the right, and jump to the next article. I imagine that’s done by inserting a tag in the html. Would be great to have that too.

    Thanks!

  7. Peter
    January 20th, 2010 at 12:53 | #7

    Sorry Nic… meant to write ‘when there’s a good id=2234234… then it works, otherwise the link from TOC fails.

  8. Another Peter
    January 24th, 2010 at 19:31 | #8

    I installed PHP 5.2.12 – and have not been able to get this to work. I have set my Economist login and password, as well as my kindle email address in the phps file (using Wordpad). However, when I run the phps file, the DOS box comes up for a fraction of a second, and disappears.
    Thoughts?

  9. Peter
    January 25th, 2010 at 16:58 | #9

    Nic – it’s not picking up the start and end of article for United States’ section. Would really appreciate if you could give this a look for a few minutes… Thanks,

    “Another Peter” … sounds like you’re not running it from the command line. My guess is that you have the working directory with insuffucient slashes “/” or something, or maybe you’re missing the curl piece? Either way, you shouldn’t just “double click” on the php file… Try to run it from within the command line window to begin with, and post the output here.

  10. January 25th, 2010 at 22:53 | #10

    Hey Peters,

    Thanks for finding this issue. I’ve updated the source with 3 fixes:

    1) As you found, many of the articles’ links didn’t work and the entire article was missing. This seemed to hit the “Americas” section a lot. I found that if you go to the URL linked from the This Week’s Edition page, the server 301 redirects to a second URL. CURL, which downloads the articles, wasn’t setup to follow this redirect, so those links were missing.

    2) Starting with this week, 01/23/2010, the HTML format of many articles changed, which broke the converter (there was an extra H1 in the HTML, which we thought was the title). This caused extraneous SCRIPT tags to be in the HTML, which caused mobigen to break completely. This should be fixed.

    3) I’ve updated the file to make sure all “See article” links work (for example, in the Politics This Week section). Before, you would just see the text “See article”. Now, it should link to the actual article.

    Let me know of any problems.

    http://nicj.net/files/economist_to_kindle.phps

  11. Peter
    February 6th, 2010 at 17:36 | #11

    Hi Nic, It’s working great now, thanks! BTW, do you know how many papers get the “View Sections List/ Next Article/Previous Article” to work on the kindle? I think it might be an embedded script tag of some sort, though it’s interesting that that part of the screen (always the very bottom) doesn’t “flash” on page-changes. Maybe it’s more of a kindle thing???

  12. February 6th, 2010 at 19:48 | #12

    Hi Peter — I don’t know how they do that, but I will look into it! Could be a kindle-specific thing, I wonder if the MOBI format natively supports that.

  1. No trackbacks yet.