Book People Archive

Re: copyright



Jeff & Paulina Miner wrote:
> 
> Yes, my pet peeve is out-of-copyright books that have been posted to the
> web using HTML coding that is utilitarian and then attempting to
> copyright that.

As Michael Hart previously noted, lynx makes a good job of dumping pages
to text. I've further refined this technique for a publishing project at
work*. Using Seth Golub's txt2html and Dave Raggett's tidy HTML cleaner,
I can generate valid HTML from a text stream. For paper publishing, I
stream this HTML into James Clark's jade DSSSL processor to get RTF; but
this is quite geeky.

 Stewart

*: no, not for stealing stuff from the web, but for processing plain
text files which contain dictionary supplements.

[Moderator: Since I'm starting to get questions on "What is Lynx?" etc.,
 here are some pointers to information about all the software above:

  Lynx (a text-based Web browser, for Unix, Windows, and DOS):
      http://lynx.browser.org/
  txt2html, by Seth Golub (a text to HTML converter):
      http://www.aigeek.com/txt2html/
  HTML Tidy, by Dave Raggett (identifies and fixes bad HTML):
     http://www.w3.org/People/Raggett/tidy/
  Jade, by James Clark (too geeky to describe nontechnically in one line :-)
      http://www.jclark.com/jade/
                                         - JMO]