The Economist, and, The Kindle: Take 2

A while ago I had written about how you can get The Economist on your Kindle (and other e-readers) by running a simple PHP script that crawls the economist.com and generates a .mobi file that it emails it to your Kindle weekly. Unfortunately (though understandably), around July 2009 they locked out their This Week’s Print Edition website to only subscribers of their online and print editions.
With a little bit of work, I’ve updated the economist_to_kindle.php PHP script to handle logging into the economist.com’s website with your user-name and password so it can generate a Kindle version again:
With this update, and if you’re a print edition subscriber, you should be able to get this week’s edition on your Kindle again.
Updated 2010/01/25: Several bugfixes, see comments for details.
Hey Nic,
Thanks for the nice work! I’ve been trying to add proxy authentication, but when I do, i lose the site authentcation.. Any ideas?
I added
//set proxy stuff
curl_setopt($ch,CURLOPT_PROXY,”webproxy:8080″);
curl_setopt($ch,CURLOPT_PROXYUSERPWD,”userdomain\\username:password”);
twice (in function economistLogin() and function economistGetUrl($url))
Thanks for your feedback!
You’re the man, Nic… Thanks.
I am afraid can’t get this to run properly. I get an error, “PHP Fatal error: Call to undefined function curl_init ….. on line 575″ and the script then stops and doesnt’ do anything. I am using php 5.2.8. I am running on Windows XP.
Could you also tell me how to set the basedir to store the files on the directory C:\Documents and Settings\LHY\My Documents\Economist? Thank you.
@Maarten
I added the CURLOPT_PROXY right before curl_exec() and it seemed to work for my local proxy. I do not have it password protected, so maybe the credentials are specified wrong? I would try printing a line for curl_error($ch) right after curl_exec() so you can see what’s wrong.
Hopefully that’ll work!
@H Y Lee
H Y Lee — You’ll have to make sure the Curl libraries are installed on your Windows PHP install:
http://curl.haxx.se/libcurl/php/install.html
http://windows.php.net/
You should be able to just update the basedir with the directory you’ve specified. That path should work on Windows. Since you’re using forward slashes (\), you’ll have to escape them:
C:\\Documents and Settings\\LHY\\My Documents\\Economist
Hi Nic, again thanks for spearheading this. A few quick things.
Some of your users (especially Linux/Mac) may want to know that kindlegen has more or less replaced mobigen_linux. A quick update of this line works (replace with kindlegen– all other options work the same, it seems.)
system(“/usr/local/bin/mobigen_linux -c1 $opfFile”);
I’m not sure what it is exactly, but it seems like many links don’t work off of mbp_toc.html. I’ve traced it back to the correct formatting in the economist.html file. When there’s a good , then it works, but half the time it’s just that gets printed out. Is there something whacked here… maybe a substitution or other that’s wiping out the id? Without it, I can’t really use the table of contents.
Also, the NYT allows you to click the ‘trackball’ to the right, and jump to the next article. I imagine that’s done by inserting a tag in the html. Would be great to have that too.
Thanks!
Sorry Nic… meant to write ‘when there’s a good id=2234234… then it works, otherwise the link from TOC fails.
I installed PHP 5.2.12 – and have not been able to get this to work. I have set my Economist login and password, as well as my kindle email address in the phps file (using Wordpad). However, when I run the phps file, the DOS box comes up for a fraction of a second, and disappears.
Thoughts?
Nic – it’s not picking up the start and end of article for United States’ section. Would really appreciate if you could give this a look for a few minutes… Thanks,
“Another Peter” … sounds like you’re not running it from the command line. My guess is that you have the working directory with insuffucient slashes “/” or something, or maybe you’re missing the curl piece? Either way, you shouldn’t just “double click” on the php file… Try to run it from within the command line window to begin with, and post the output here.
Hey Peters,
Thanks for finding this issue. I’ve updated the source with 3 fixes:
1) As you found, many of the articles’ links didn’t work and the entire article was missing. This seemed to hit the “Americas” section a lot. I found that if you go to the URL linked from the This Week’s Edition page, the server 301 redirects to a second URL. CURL, which downloads the articles, wasn’t setup to follow this redirect, so those links were missing.
2) Starting with this week, 01/23/2010, the HTML format of many articles changed, which broke the converter (there was an extra H1 in the HTML, which we thought was the title). This caused extraneous SCRIPT tags to be in the HTML, which caused mobigen to break completely. This should be fixed.
3) I’ve updated the file to make sure all “See article” links work (for example, in the Politics This Week section). Before, you would just see the text “See article”. Now, it should link to the actual article.
Let me know of any problems.
http://nicj.net/files/economist_to_kindle.phps
Hi Nic, It’s working great now, thanks! BTW, do you know how many papers get the “View Sections List/ Next Article/Previous Article” to work on the kindle? I think it might be an embedded script tag of some sort, though it’s interesting that that part of the screen (always the very bottom) doesn’t “flash” on page-changes. Maybe it’s more of a kindle thing???
Hi Peter — I don’t know how they do that, but I will look into it! Could be a kindle-specific thing, I wonder if the MOBI format natively supports that.