The Economist, and, The Kindle: Take 2

January 3rd, 2010
Share

A while ago I had written about how you can get The Economist on your Kindle (and other e-readers) by running a simple PHP script that crawls the economist.com and generates a .mobi file that it emails it to your Kindle weekly. Unfortunately (though understandably), around July 2009 they locked out their This Week’s Print Edition website to only subscribers of their online and print editions.

With a little bit of work, I’ve updated the economist-to-kindle.php PHP script to handle logging into the economist.com’s website with your user-name and password so it can generate a Kindle version again:

https://github.com/nicjansma/economist-to-kindle

With this update, and if you’re a print edition subscriber, you should be able to get this week’s edition on your Kindle again.

Updated 2010/01/25: Several bugfixes, see comments for details.

Updated 2010/07/22: slifox and crosscode have made some great additions to the code and got it working with the Economist.com’s latest site structure. Check out crosscode’s latest version or read the comments for details.

Updated 2011/05/02: Based on crosscode’s latest version, I’ve updated the script on this site (https://github.com/nicjansma/economist-to-kindle) to work with recent Economist.com articles.

Updated 2011/07/26: Small update to work with the economist.com’s latest updates: https://github.com/nicjansma/economist-to-kindle

Updated 2012/01/04: I’ve moded this project to Github: https://github.com/nicjansma/economist-to-kindle. If you have any suggestions, find bugs, or want to contribute, please head there.

  1. Maarten
    January 12th, 2010 at 00:06 | #1

    Hey Nic,
    Thanks for the nice work! I’ve been trying to add proxy authentication, but when I do, i lose the site authentcation.. Any ideas?

    I added
    //set proxy stuff
    curl_setopt($ch,CURLOPT_PROXY,”webproxy:8080″);
    curl_setopt($ch,CURLOPT_PROXYUSERPWD,”userdomain\\username:password”);

    twice (in function economistLogin() and function economistGetUrl($url))
    Thanks for your feedback!

  2. Peter
    January 14th, 2010 at 19:02 | #2

    You’re the man, Nic… Thanks.

  3. H Y Lee
    January 15th, 2010 at 17:17 | #3

    I am afraid can’t get this to run properly. I get an error, “PHP Fatal error: Call to undefined function curl_init ….. on line 575” and the script then stops and doesnt’ do anything. I am using php 5.2.8. I am running on Windows XP.

    Could you also tell me how to set the basedir to store the files on the directory C:\Documents and Settings\LHY\My Documents\Economist? Thank you.

  4. January 15th, 2010 at 21:30 | #4

    @Maarten
    I added the CURLOPT_PROXY right before curl_exec() and it seemed to work for my local proxy. I do not have it password protected, so maybe the credentials are specified wrong? I would try printing a line for curl_error($ch) right after curl_exec() so you can see what’s wrong.

    Hopefully that’ll work!

  5. January 15th, 2010 at 21:32 | #5

    @H Y Lee
    H Y Lee — You’ll have to make sure the Curl libraries are installed on your Windows PHP install:
    http://curl.haxx.se/libcurl/php/install.html
    http://windows.php.net/

    You should be able to just update the basedir with the directory you’ve specified. That path should work on Windows. Since you’re using forward slashes (\), you’ll have to escape them:
    C:\\Documents and Settings\\LHY\\My Documents\\Economist

  6. Peter
    January 20th, 2010 at 12:52 | #6

    Hi Nic, again thanks for spearheading this. A few quick things.

    Some of your users (especially Linux/Mac) may want to know that kindlegen has more or less replaced mobigen_linux. A quick update of this line works (replace with kindlegen– all other options work the same, it seems.)
    system(“/usr/local/bin/mobigen_linux -c1 $opfFile”);

    I’m not sure what it is exactly, but it seems like many links don’t work off of mbp_toc.html. I’ve traced it back to the correct formatting in the economist.html file. When there’s a good , then it works, but half the time it’s just that gets printed out. Is there something whacked here… maybe a substitution or other that’s wiping out the id? Without it, I can’t really use the table of contents.

    Also, the NYT allows you to click the ‘trackball’ to the right, and jump to the next article. I imagine that’s done by inserting a tag in the html. Would be great to have that too.

    Thanks!

  7. Peter
    January 20th, 2010 at 12:53 | #7

    Sorry Nic… meant to write ‘when there’s a good id=2234234… then it works, otherwise the link from TOC fails.

  8. Another Peter
    January 24th, 2010 at 19:31 | #8

    I installed PHP 5.2.12 – and have not been able to get this to work. I have set my Economist login and password, as well as my kindle email address in the phps file (using Wordpad). However, when I run the phps file, the DOS box comes up for a fraction of a second, and disappears.
    Thoughts?

  9. Peter
    January 25th, 2010 at 16:58 | #9

    Nic – it’s not picking up the start and end of article for United States’ section. Would really appreciate if you could give this a look for a few minutes… Thanks,

    “Another Peter” … sounds like you’re not running it from the command line. My guess is that you have the working directory with insuffucient slashes “/” or something, or maybe you’re missing the curl piece? Either way, you shouldn’t just “double click” on the php file… Try to run it from within the command line window to begin with, and post the output here.

  10. January 25th, 2010 at 22:53 | #10

    Hey Peters,

    Thanks for finding this issue. I’ve updated the source with 3 fixes:

    1) As you found, many of the articles’ links didn’t work and the entire article was missing. This seemed to hit the “Americas” section a lot. I found that if you go to the URL linked from the This Week’s Edition page, the server 301 redirects to a second URL. CURL, which downloads the articles, wasn’t setup to follow this redirect, so those links were missing.

    2) Starting with this week, 01/23/2010, the HTML format of many articles changed, which broke the converter (there was an extra H1 in the HTML, which we thought was the title). This caused extraneous SCRIPT tags to be in the HTML, which caused mobigen to break completely. This should be fixed.

    3) I’ve updated the file to make sure all “See article” links work (for example, in the Politics This Week section). Before, you would just see the text “See article”. Now, it should link to the actual article.

    Let me know of any problems.

    http://nicj.net/files/economist_to_kindle.phps

  11. Peter
    February 6th, 2010 at 17:36 | #11

    Hi Nic, It’s working great now, thanks! BTW, do you know how many papers get the “View Sections List/ Next Article/Previous Article” to work on the kindle? I think it might be an embedded script tag of some sort, though it’s interesting that that part of the screen (always the very bottom) doesn’t “flash” on page-changes. Maybe it’s more of a kindle thing???

  12. February 6th, 2010 at 19:48 | #12

    Hi Peter — I don’t know how they do that, but I will look into it! Could be a kindle-specific thing, I wonder if the MOBI format natively supports that.

  13. Josh
    March 27th, 2010 at 10:46 | #13

    Just letting others know since I had to get libcurl working on debian (should work for ubuntu as well) sudo apt-get install php5-curl
    then it started working for me.

    Unfortunately, I’m having trouble with back articles. Parser seems thrown by the article subscription links (I’m an online only subscriber). Seems like it would be easy to fix, but the php is throwing me. Halfway tempted to rewrite in ruby, but I’m sure that it’s not worth it, since the script works great for current artciles.

  14. Josh
    April 4th, 2010 at 06:56 | #14

    Got it working. Style must be different for online only subscribers for back issues. I just had to update where it found the end of the body. Patch contents are:

    397c397
    $endPos = getArticleEnd($article);
    414c414
    $endPos = getArticleEnd($article);
    520c520
    function getArticleEnd($article)
    539,543d538
    < $end4 = strpos($article, '<script', $startPos);
    < if ($end4) {
    < $ends[] = $end4;
    < }
    <

  15. April 7th, 2010 at 20:54 | #15

    Thanks Josh! If you don’t mind, I’ve added your update to the script on this site so others will have the fix.

  16. Peter
    June 14th, 2010 at 13:42 | #16

    Hi Nic,
    Looks like the Economist has changed the URL structure, now using ./node/SID where SID is the ‘story ID’. Is something up with curl, that’s not following through correctly? I’m on a new system, so maybe this whole thing is a config issue on my end, but it seems the script has suddenly stopped working in the last week or so.
    Thanks, -Peter

  17. slifox
    June 27th, 2010 at 16:32 | #17

    Hi all,

    It looks like The Economist website has changed again. I’ve made some modifications to the script so it works with the latest print issue from the website.

    This is the first time I’m using this script, so I didn’t have the old HTML files for comparison. I also don’t have access to the online-only subscription, and haven’t tested any back-issues. Therefore you might have problems using my modified script for these things — however it works fine for the latest print issue available from the website.

    I changed the login to use HTTPS instead of HTTP, so your password is sent encrypted (hopefully — I didn’t veryify this). I also added a few more options, including an option to disable Email sending, and an option to enable creating a link to the latest issue. The link creation allows you to have a single URL on your webserver which will always provide you the latest issue (which you can then download using your Kindle webbrowser).

    Here is the script:
    https://www.revlogic.net/public/economist_to_kindle.phps
    (hit “Accept” / “Allow” / “Add Exception” / etc.. if you get a warning about SSL certificates while downloading the script)

    You’ll also need mobigen_linux, which can be downloaded from here:
    http://nicj.net/files/mobigen_linux.tar.gz
    Or alternatively you can use kindlegen instead of mobigen_linux (though I haven’t tried):
    http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621

    Feel free to send me any comments or provide fixes for any broken features that I didn’t test.

    -slifox

  18. crosscode
    July 4th, 2010 at 10:02 | #18

    Here’s my modifications of the script, based on the changes slifox made.

    Changes
    * better handling of images (afaik it doesn’t miss any)
    * it’ll optionally read its configuration from economist_to_kindle.config.php
    * print the name of the articles as it downloads them (mostly for debugging)
    * optionally tell the kindle to treat the article as a magazine (since not all the kindle magazine features are supported thanks to a complete lack of documentation, some people may not like this option much)
    * downloads the issue cover (although it can be quite difficult to get the kindle to show it to you, and it’s VERY low resolution)
    * verified to work with back issues, although issues before June of 2010 use special compatibility code 🙁 (not a problem for using the script, just a problem when editing it)

    Known Bugs
    * any kindle menu with “section” in the name (fex. “Go to Section…”) will cause the kindle to close the book with an errork. There’s no permanent damage from this.
    * I’ve only tested this with kindlegen 1.1 on linux and kindle software v2.5.2. YMMV.

    Download:
    http://www.crosscode.org/public/economist_to_kindle.phps

    Sample economist_to_kindle.config.php:

  19. July 16th, 2010 at 05:42 | #19

    Thank you so much for this script!! But does anyone know how to get this working with Windows 7/Kindlegen?

  20. July 16th, 2010 at 05:57 | #20

    Rather, I changed the information to use kindlegen in a random Win directory, but even though my Economist login information is correct, the program still returns the login failed error message — did they update something?

  21. July 22nd, 2010 at 19:21 | #21

    Great changes slifox and crosscode! I’ll link to crosscode’s latest version from the main article so everyone can get the latest version.

  22. Gerardo
    July 23rd, 2010 at 13:01 | #22

    Does the newest script work with Windows and/or Kindlegen? If so, how?

  23. Peter
    August 18th, 2010 at 14:55 | #23

    Hello everybody,

    thanks for the great script!

    But unfortunately it does not work, I encounter the same issue as Gerardo with “Could not log in to economist.com: Username or password mismatch!”.

    I reckon it has something to do with cookie handling, but setting all rights to 755 and changing line 90

    from

    $GLOBALS[‘cookieJarFile’] = $baseDir . ‘/economist-cookies.txt’;

    to

    $GLOBALS[‘cookieJarFile’] = $baseDir . ‘economist-cookies.txt’;

    didnt work either.

    Any hints?

    Thanks,
    Peter

  24. Peter
    August 19th, 2010 at 13:00 | #24

    Okay, I fixed the issue I think. The fix above is not correct, as the original source code WAS correct 😉

    The reason why it did not work for me was simple: I provided the wrong baseDir path – make sure that you provide the WHOLE filepath, not only the one which is accessable by the webdirectory!

    Peter

  25. Peter
    August 24th, 2010 at 12:49 | #25

    Back again – anyone else encountering the problem that kindlegen simply dies with a bus error without producing any output and the same for mobigen_linux except that there is a library statically linked?

  26. sitjn
    January 16th, 2011 at 08:44 | #26

    Hello everybody. This is EXACTLY what I was looking for: Awesome.
    I’m a real noob here, but I did the following cool things
    – I installed php
    – I went to php in program files, made kindle.bat with content “php the economist_to_kindle.php”
    – Downloaded the economist_to_kindle.phps, renamed it to the economist_to_kindle.php
    edited the economist_to_kindle.php to include my username and password and $baseDir = ‘D:\\boeken\\script’;
    – But I seem to be stuck with the mobigen_linux thing. I downloaded it, but where do I put it. Where is /usr/local/bin/?

  27. Konstantine
    January 20th, 2011 at 03:18 | #27

    @ sitjn

    You use Windows, /usr/local/bin is Linux 😉

  28. Konstantine
    January 20th, 2011 at 04:23 | #28

    Can anyone help me? When I execute the script, it tries to download different articles but give an error ‘ERROR in processing!’ on each one of them. Why?

  29. January 21st, 2011 at 10:58 | #29

    @ sitjn
    I would recommend looking into KindleGen: http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621

    I haven’t personally tried this on Windows, but others that have commented here have. Good luck!

  30. Jman
    March 8th, 2011 at 13:13 | #30

    Thanks to all the contributors for this great script. Could you be so kind as to provide an update that enables the magazine to be browsed using kindle 3.1 firmware?

  31. Henao
    March 20th, 2011 at 15:15 | #31

    This script is great and beats the bloat of calibre. I am able to run it on my nslu2 (arm) with debian, but no luck with mobigen. “unable to execute binary file”. Any idea or alternatives that work with arm?

  32. March 20th, 2011 at 19:18 | #32

    @ Henao
    I don’t know of anything that works on arm currently (mobigen or KindleGen). Might be a good opportunity to open a suggestion with Kindle’s support.

  33. March 20th, 2011 at 19:19 | #33

    @ Jman
    Are you talking about the new Folder features to group the Economist editions? I haven’t had much time to work on it lately to add features. If anyone else is so inclined, that would be great!

  34. May 6th, 2011 at 22:57 | #34

    Hello,

    First off, thanks for this script! I’m looking forward to getting it to work, I have set up everything as instructed but only seem to get the index page on my kindle

    am I missing something?

    Thanks again
    Ian

  35. June 2nd, 2011 at 16:25 | #35

    Hi Ian,

    Today I updated the version of the script I’m hosting here:
    http://nicj.net/files/economist_to_kindle.phps

    It’s based off of CrossCode’s updates, but disables part of the script (NCX TOC generation) that I can’t get working with today’s economist.com and mobigen_linux.

    Let me know if it works for you or not.

  36. G
    July 23rd, 2011 at 06:13 | #36

    Have they changed the URL format again? Not getting any content in my downloads since last week.

  37. koko
    July 24th, 2011 at 10:02 | #37

    @ Nic

    Looks like the design was changed. economist.html only got mbp:pagebreaks and not article contents

  38. July 26th, 2011 at 20:25 | #38

    Thanks for letting me know. There was a small change.
    Update at http://nicj.net/files/economist_to_kindle.phps

  39. Stephen
    August 20th, 2011 at 18:12 | #39

    Nic :@ Henao I don’t know of anything that works on arm currently (mobigen or KindleGen). Might be a good opportunity to open a suggestion with Kindle’s support.

    Unfortunately Calibre is the only program I’ve found that generates .mobi files on ARM, which is a shame because Calibre really stresses it. I’ve been using Calibre on a Pogoplug running Debian and as long as you give it a fair amount of time to work it’ll absolutely feed a Kindle.

Comments are closed.