Re: copyright
- From: "Stewart C. Russell" <scruss@[redacted]>
- Subject: Re: copyright
- Date: Thu, 03 May 2001 18:49:40 +0100
Jeff & Paulina Miner wrote:
>
> Yes, my pet peeve is out-of-copyright books that have been posted to the
> web using HTML coding that is utilitarian and then attempting to
> copyright that.
As Michael Hart previously noted, lynx makes a good job of dumping pages
to text. I've further refined this technique for a publishing project at
work*. Using Seth Golub's txt2html and Dave Raggett's tidy HTML cleaner,
I can generate valid HTML from a text stream. For paper publishing, I
stream this HTML into James Clark's jade DSSSL processor to get RTF; but
this is quite geeky.
Stewart
*: no, not for stealing stuff from the web, but for processing plain
text files which contain dictionary supplements.
[Moderator: Since I'm starting to get questions on "What is Lynx?" etc.,
here are some pointers to information about all the software above:
Lynx (a text-based Web browser, for Unix, Windows, and DOS):
http://lynx.browser.org/
txt2html, by Seth Golub (a text to HTML converter):
http://www.aigeek.com/txt2html/
HTML Tidy, by Dave Raggett (identifies and fixes bad HTML):
http://www.w3.org/People/Raggett/tidy/
Jade, by James Clark (too geeky to describe nontechnically in one line :-)
http://www.jclark.com/jade/
- JMO]