I have created a library used for interfacing with Gutenberg from python code.
This is the first article about GutenbergPy
Why use my library ?
- Only needs lxml (pymongo only if you use mongodb)
- SQLite cache build time: about 2 minutes (instead of more than one day)
- SQLite cache size: about 120 mb
- Mongodb cache build time: about 3 minutes (will probably be less in the future, as it’s not optimized)
- Mongodb cache size: about 300 mb (instead of 2 Gb berkley db previous solution)
- Fast queries on both solutions