big data

GutenbergPy

Radu Angelescu

I have created a library used for interfacing with Gutenberg from python code. This is the first article about GutenbergPy

Why use my library ?

  • Only needs lxml (pymongo only if you use mongodb)
  • SQLite cache build time: about 2 minutes (instead of more than one day)
  • SQLite cache size: about 120 mb
  • Mongodb cache build time: about 3 minutes (will probably be less in the future, as it’s not optimized)
  • Mongodb cache size: about 300 mb (instead of 2 Gb berkley db previous solution)
  • Fast queries on both solutions