The Economist, and, The Kindle: Take 2
A while ago I had written about how you can get The Economist on your Kindle (and other e-readers) by running a simple PHP script that crawls the economist.com and generates a .mobi file that it emails it to your Kindle weekly. Unfortunately (though understandably), around July 2009 they locked out their This Week’s Print Edition website to only subscribers of their online and print editions.
With a little bit of work, I’ve updated the economist-to-kindle.php PHP script to handle logging into the economist.com’s website with your user-name and password so it can generate a Kindle version again:
https://github.com/nicjansma/economist-to-kindle
With this update, and if you’re a print edition subscriber, you should be able to get this week’s edition on your Kindle again.
Updated 2010/01/25: Several bugfixes, see comments for details.
Updated 2010/07/22: slifox and crosscode have made some great additions to the code and got it working with the Economist.com’s latest site structure. Check out crosscode’s latest version or read the comments for details.
Updated 2011/05/02: Based on crosscode’s latest version, I’ve updated the script on this site (https://github.com/nicjansma/economist-to-kindle) to work with recent Economist.com articles.
Updated 2011/07/26: Small update to work with the economist.com’s latest updates: https://github.com/nicjansma/economist-to-kindle
Updated 2012/01/04: I’ve moded this project to Github: https://github.com/nicjansma/economist-to-kindle. If you have any suggestions, find bugs, or want to contribute, please head there.
Hey Nic,
Thanks for the nice work! I’ve been trying to add proxy authentication, but when I do, i lose the site authentcation.. Any ideas?
I added
//set proxy stuff
curl_setopt($ch,CURLOPT_PROXY,”webproxy:8080″);
curl_setopt($ch,CURLOPT_PROXYUSERPWD,”userdomain\\username:password”);
twice (in function economistLogin() and function economistGetUrl($url))
Thanks for your feedback!
You’re the man, Nic… Thanks.
I am afraid can’t get this to run properly. I get an error, “PHP Fatal error: Call to undefined function curl_init ….. on line 575” and the script then stops and doesnt’ do anything. I am using php 5.2.8. I am running on Windows XP.
Could you also tell me how to set the basedir to store the files on the directory C:\Documents and Settings\LHY\My Documents\Economist? Thank you.
@Maarten
I added the CURLOPT_PROXY right before curl_exec() and it seemed to work for my local proxy. I do not have it password protected, so maybe the credentials are specified wrong? I would try printing a line for curl_error($ch) right after curl_exec() so you can see what’s wrong.
Hopefully that’ll work!
@H Y Lee
H Y Lee — You’ll have to make sure the Curl libraries are installed on your Windows PHP install:
http://curl.haxx.se/libcurl/php/install.html
http://windows.php.net/
You should be able to just update the basedir with the directory you’ve specified. That path should work on Windows. Since you’re using forward slashes (\), you’ll have to escape them:
C:\\Documents and Settings\\LHY\\My Documents\\Economist
Hi Nic, again thanks for spearheading this. A few quick things.
Some of your users (especially Linux/Mac) may want to know that kindlegen has more or less replaced mobigen_linux. A quick update of this line works (replace with kindlegen– all other options work the same, it seems.)
system(“/usr/local/bin/mobigen_linux -c1 $opfFile”);
I’m not sure what it is exactly, but it seems like many links don’t work off of mbp_toc.html. I’ve traced it back to the correct formatting in the economist.html file. When there’s a good , then it works, but half the time it’s just that gets printed out. Is there something whacked here… maybe a substitution or other that’s wiping out the id? Without it, I can’t really use the table of contents.
Also, the NYT allows you to click the ‘trackball’ to the right, and jump to the next article. I imagine that’s done by inserting a tag in the html. Would be great to have that too.
Thanks!
Sorry Nic… meant to write ‘when there’s a good id=2234234… then it works, otherwise the link from TOC fails.
I installed PHP 5.2.12 – and have not been able to get this to work. I have set my Economist login and password, as well as my kindle email address in the phps file (using Wordpad). However, when I run the phps file, the DOS box comes up for a fraction of a second, and disappears.
Thoughts?
Nic – it’s not picking up the start and end of article for United States’ section. Would really appreciate if you could give this a look for a few minutes… Thanks,
“Another Peter” … sounds like you’re not running it from the command line. My guess is that you have the working directory with insuffucient slashes “/” or something, or maybe you’re missing the curl piece? Either way, you shouldn’t just “double click” on the php file… Try to run it from within the command line window to begin with, and post the output here.
Hey Peters,
Thanks for finding this issue. I’ve updated the source with 3 fixes:
1) As you found, many of the articles’ links didn’t work and the entire article was missing. This seemed to hit the “Americas” section a lot. I found that if you go to the URL linked from the This Week’s Edition page, the server 301 redirects to a second URL. CURL, which downloads the articles, wasn’t setup to follow this redirect, so those links were missing.
2) Starting with this week, 01/23/2010, the HTML format of many articles changed, which broke the converter (there was an extra H1 in the HTML, which we thought was the title). This caused extraneous SCRIPT tags to be in the HTML, which caused mobigen to break completely. This should be fixed.
3) I’ve updated the file to make sure all “See article” links work (for example, in the Politics This Week section). Before, you would just see the text “See article”. Now, it should link to the actual article.
Let me know of any problems.
http://nicj.net/files/economist_to_kindle.phps
Hi Nic, It’s working great now, thanks! BTW, do you know how many papers get the “View Sections List/ Next Article/Previous Article” to work on the kindle? I think it might be an embedded script tag of some sort, though it’s interesting that that part of the screen (always the very bottom) doesn’t “flash” on page-changes. Maybe it’s more of a kindle thing???
Hi Peter — I don’t know how they do that, but I will look into it! Could be a kindle-specific thing, I wonder if the MOBI format natively supports that.
Just letting others know since I had to get libcurl working on debian (should work for ubuntu as well) sudo apt-get install php5-curl
then it started working for me.
Unfortunately, I’m having trouble with back articles. Parser seems thrown by the article subscription links (I’m an online only subscriber). Seems like it would be easy to fix, but the php is throwing me. Halfway tempted to rewrite in ruby, but I’m sure that it’s not worth it, since the script works great for current artciles.
Got it working. Style must be different for online only subscribers for back issues. I just had to update where it found the end of the body. Patch contents are:
397c397
$endPos = getArticleEnd($article);
414c414
$endPos = getArticleEnd($article);
520c520
function getArticleEnd($article)
539,543d538
< $end4 = strpos($article, '<script', $startPos);
< if ($end4) {
< $ends[] = $end4;
< }
<
Thanks Josh! If you don’t mind, I’ve added your update to the script on this site so others will have the fix.
Hi Nic,
Looks like the Economist has changed the URL structure, now using ./node/SID where SID is the ‘story ID’. Is something up with curl, that’s not following through correctly? I’m on a new system, so maybe this whole thing is a config issue on my end, but it seems the script has suddenly stopped working in the last week or so.
Thanks, -Peter
Hi all,
It looks like The Economist website has changed again. I’ve made some modifications to the script so it works with the latest print issue from the website.
This is the first time I’m using this script, so I didn’t have the old HTML files for comparison. I also don’t have access to the online-only subscription, and haven’t tested any back-issues. Therefore you might have problems using my modified script for these things — however it works fine for the latest print issue available from the website.
I changed the login to use HTTPS instead of HTTP, so your password is sent encrypted (hopefully — I didn’t veryify this). I also added a few more options, including an option to disable Email sending, and an option to enable creating a link to the latest issue. The link creation allows you to have a single URL on your webserver which will always provide you the latest issue (which you can then download using your Kindle webbrowser).
Here is the script:
https://www.revlogic.net/public/economist_to_kindle.phps
(hit “Accept” / “Allow” / “Add Exception” / etc.. if you get a warning about SSL certificates while downloading the script)
You’ll also need mobigen_linux, which can be downloaded from here:
http://nicj.net/files/mobigen_linux.tar.gz
Or alternatively you can use kindlegen instead of mobigen_linux (though I haven’t tried):
http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621
Feel free to send me any comments or provide fixes for any broken features that I didn’t test.
-slifox
Here’s my modifications of the script, based on the changes slifox made.
Changes
* better handling of images (afaik it doesn’t miss any)
* it’ll optionally read its configuration from economist_to_kindle.config.php
* print the name of the articles as it downloads them (mostly for debugging)
* optionally tell the kindle to treat the article as a magazine (since not all the kindle magazine features are supported thanks to a complete lack of documentation, some people may not like this option much)
* downloads the issue cover (although it can be quite difficult to get the kindle to show it to you, and it’s VERY low resolution)
* verified to work with back issues, although issues before June of 2010 use special compatibility code 🙁 (not a problem for using the script, just a problem when editing it)
Known Bugs
* any kindle menu with “section” in the name (fex. “Go to Section…”) will cause the kindle to close the book with an errork. There’s no permanent damage from this.
* I’ve only tested this with kindlegen 1.1 on linux and kindle software v2.5.2. YMMV.
Download:
http://www.crosscode.org/public/economist_to_kindle.phps
Sample economist_to_kindle.config.php:
Thank you so much for this script!! But does anyone know how to get this working with Windows 7/Kindlegen?
Rather, I changed the information to use kindlegen in a random Win directory, but even though my Economist login information is correct, the program still returns the login failed error message — did they update something?
Great changes slifox and crosscode! I’ll link to crosscode’s latest version from the main article so everyone can get the latest version.
Does the newest script work with Windows and/or Kindlegen? If so, how?
Hello everybody,
thanks for the great script!
But unfortunately it does not work, I encounter the same issue as Gerardo with “Could not log in to economist.com: Username or password mismatch!”.
I reckon it has something to do with cookie handling, but setting all rights to 755 and changing line 90
from
$GLOBALS[‘cookieJarFile’] = $baseDir . ‘/economist-cookies.txt’;
to
$GLOBALS[‘cookieJarFile’] = $baseDir . ‘economist-cookies.txt’;
didnt work either.
Any hints?
Thanks,
Peter
Okay, I fixed the issue I think. The fix above is not correct, as the original source code WAS correct 😉
The reason why it did not work for me was simple: I provided the wrong baseDir path – make sure that you provide the WHOLE filepath, not only the one which is accessable by the webdirectory!
Peter
Back again – anyone else encountering the problem that kindlegen simply dies with a bus error without producing any output and the same for mobigen_linux except that there is a library statically linked?
Hello everybody. This is EXACTLY what I was looking for: Awesome.
I’m a real noob here, but I did the following cool things
– I installed php
– I went to php in program files, made kindle.bat with content “php the economist_to_kindle.php”
– Downloaded the economist_to_kindle.phps, renamed it to the economist_to_kindle.php
edited the economist_to_kindle.php to include my username and password and $baseDir = ‘D:\\boeken\\script’;
– But I seem to be stuck with the mobigen_linux thing. I downloaded it, but where do I put it. Where is /usr/local/bin/?
@ sitjn
You use Windows, /usr/local/bin is Linux 😉
Can anyone help me? When I execute the script, it tries to download different articles but give an error ‘ERROR in processing!’ on each one of them. Why?
@ sitjn
I would recommend looking into KindleGen: http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621
I haven’t personally tried this on Windows, but others that have commented here have. Good luck!
Thanks to all the contributors for this great script. Could you be so kind as to provide an update that enables the magazine to be browsed using kindle 3.1 firmware?
This script is great and beats the bloat of calibre. I am able to run it on my nslu2 (arm) with debian, but no luck with mobigen. “unable to execute binary file”. Any idea or alternatives that work with arm?
@ Henao
I don’t know of anything that works on arm currently (mobigen or KindleGen). Might be a good opportunity to open a suggestion with Kindle’s support.
@ Jman
Are you talking about the new Folder features to group the Economist editions? I haven’t had much time to work on it lately to add features. If anyone else is so inclined, that would be great!
Hello,
First off, thanks for this script! I’m looking forward to getting it to work, I have set up everything as instructed but only seem to get the index page on my kindle
am I missing something?
Thanks again
Ian
Hi Ian,
Today I updated the version of the script I’m hosting here:
http://nicj.net/files/economist_to_kindle.phps
It’s based off of CrossCode’s updates, but disables part of the script (NCX TOC generation) that I can’t get working with today’s economist.com and mobigen_linux.
Let me know if it works for you or not.
Have they changed the URL format again? Not getting any content in my downloads since last week.
@ Nic
Looks like the design was changed. economist.html only got mbp:pagebreaks and not article contents
Thanks for letting me know. There was a small change.
Update at http://nicj.net/files/economist_to_kindle.phps
Unfortunately Calibre is the only program I’ve found that generates .mobi files on ARM, which is a shame because Calibre really stresses it. I’ve been using Calibre on a Pogoplug running Debian and as long as you give it a fair amount of time to work it’ll absolutely feed a Kindle.