Bib Scrape

Preparing a BibTeX file for publication is a very error-prone and tedious process. The entries must be correct, consistent and nicely formatted. I was never happy with the results produced by any of the BibTeX scraping systems that I looked at so I’ve written my own. It collects BibTeX entries from the websites of computer-science academic publishers, and I use it personally to make preparing my BibTeX files easier. Currently it supports

  • ACM,
  • Springer,
  • Science Direct,
  • IEEE Computer Society,
  • IEEE Explore,
  • Cambridge Journals (e.g. Journal of Functional Programming),
  • JSTOR,
  • IOS Press, and
  • Wiley.

In addition, this scraper fixes common problems with the BibTeX that these services provide. For example, it fixes:

  • the handling of Unicode and other formatting (e.g. subscripts) in titles;
  • the incorrect use of the ‘issue’ field instead of the ‘number’ field;
  • the format of the ‘doi’ and ‘pages’ fields;
  • the use of macros for the ‘month’ field; and
  • numerous miscellaneous problems with specific publishers.

Download

This tool is currently available as a limited public release (basically a semi-closed beta). If you are interested in giving it a try, let me know and I’ll send you the download link.

Requirements

  • Perl and the following modules
    • Algorithm::Diff
    • Class::Struct
    • HTML::Entities
    • HTML::HeadParser
    • Text::BibTeX
    • WWW::Mechanize
    • XML::Parser