Gutenberg Literary Clock

Radu Angelescu

2022-12-27

This article is about building a literary clock with gutenberg-rs in Rust.

What is a literary clock?

In 2011 there was this exposition: Christian Marclay’s The Clock. It was one of the most spectacular pieces of video art, at the British Art Show in London. The Clock presented an entire 24-hour cycle in elegantly sutured clips from thousands of different feature films. Each of Marclay’s grand patchwork’s narrative snippets, all of which feature references to a specific time of day, was precisely synced to the actual clock time in the room where you’re watching. This art exibit launched the idea of “fictional time” that was continued by The Guardian. In response to the exposition, Guardian writers asked readers to submit quotes from their favorite books. These quotes needed to mentioned specific times in order to create a literary clock. This became somewhat popular on the internet, with many clock implementations utilizing guardians time quotes collection.

I myself love the idea and I am kind of bugged I never saw Marclay’s exposition but figured that I could write an app that used my gutenberg-rs library to generate new quotes without using crowd sourcing. I did exactly that so this tutorial will be about how you could also make your own original literary clock without relying on the guardian’s data.

The code is on github and you are free to get it here.

You can check the youtube playlist if you want to see a video tutorial on creating this.

We have four main parts:

Getting the gutenberg cache
Generating a full text search database from particular gutenberg books
Generating the literary clock database
Displaying everything via webview

Getting the gutenberg cache

In the first part we call the gutenberg cache functions as in the library tutorial and wait for everything to get populated. This will take some time but there is nothing to it (I advise to run this part with a release build so it is faster, the cache will only get generated once):

    let settings = GutenbergCacheSettings::default();

    if !std::path::Path::new(&settings.cache_filename).exists() {
        setup_sqlite(&settings, false, true).await?;
    }

    let mut cache = SQLiteCache::get_cache(&settings)?;

Generating a full text search database from particular gutenberg books

This part is just about inserting a lot of text fast into an FTS5 sqlite table. Note that SQLITE does not come with FTS5 in all applications as this is an extension, so the table cannot be used in a lot of sqlite database visualizers. The visualizers need to be built with the FTS5 extension.

More details about this here. The documentation is straight forward and there are not that many options when using this extension so it is worth a read. This extension is actually the main reason I did not add support for mongodb in gutenberg-rs as it solves full text search for sqlite and it’s pretty fast.

I added more detailed explanations as comments in the below code.

    // we first create a connection to the database
    let fts_connection = Box::new(Connection::open(fts_filename)?);
    // this is the query used to create the fts5 table
    let create_table_query = "CREATE VIRTUAL TABLE book USING fts5(bookid, text);";
    // execute the query
    fts_connection.execute(create_table_query, ())?;
    // this configuration is set so everything will be "blazingly fast" :D
    // turning of journaling synchronous and setting a huge cache size in memory
    // is great for performance when you first populate a db programmatically
    // you should not set these if you use this database in a webservice for example 
    let make_inserts_faster_sql = "PRAGMA journal_mode = OFF;PRAGMA synchronous = 0; PRAGMA cache_size = 1000000; PRAGMA locking_mode = EXCLUSIVE; PRAGMA temp_store = MEMORY;";
    // we execute a batch here because the above statement has multiple parts
    fts_connection.execute_batch(make_inserts_faster_sql)?;
    // we prepare the insert statement
    let mut fts_insert_stmt =
        fts_connection.prepare("INSERT INTO book(bookid, text) VALUES(?1,?2)")?;
    // we get the gutenberg book ids that have these particular properties
    let book_gutenberg_ids = cache.query(&json!({
        "language": "\"en\"",
        "bookshelve": "'Romantic Fiction', 'Astounding Stories', 'Mystery Fiction', 'Erotic Fiction', 'Mythology', 'Adventure', 'Humor', 'Bestsellers, American, 1895-1923', 'Short Stories', 'Harvard Classics', 'Science Fiction', 'Gothic Fiction', 'Fantasy'",
    }))?;

    // go through each book id and downlod the content, cleaning it up and splitting it
    // into paragraphs
    for (idx, gutenberg_id) in book_gutenberg_ids.iter().enumerate() {
        let links = cache.get_download_links(vec![*gutenberg_id])?;

        if let Some(link) = links.first() {
            let text = get_text_from_link(&settings, link).await?;
            let stripped_text = strip_headers(text);
            let paragraphs: Split<&str> = stripped_text.split("\n\r");
            for paragraph in paragraphs {
                let paragraph_trimmed = paragraph.trim();
                if paragraph_trimmed.is_empty() {
                    continue;
                }
                if paragraph_trimmed.len() < 64 {
                    continue;
                }
                // insert the paragraph content with an id that also hold some informations
                // we will later need (the gutenberg id and the download link)
                fts_insert_stmt
                    .execute((format!("${}${}$", gutenberg_id, link), paragraph_trimmed))?;
            }
        }
    }
    Ok(())

Generating the literary clock database

To generate the literary clock database the idea is simple: We go through all possible hour:minutes combinations and search the spoken form of that specific time in all the book contents via our fts database created earlier. Note that the whole example is structured like this so you can tailor it to other applications where you would need full text search databases.

    // open a connection to the fts database
    let fts_connection = Box::new(Connection::open(db_filename)?);
    // create the literary clock database only if it does not already exist
    if !std::path::Path::new(lit_clock_db).exists() {
        let lit_clock_db = Box::new(Connection::open(lit_clock_db)?);
        // create the sql table with the necessary data, optimized for inserts
        lit_clock_db.execute_batch("
        CREATE TABLE littime (
            id INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE,
            time INTEGER,
            text TEXT,
            author TEXT,
            title TEXT,
            link TEXT
        );
        PRAGMA journal_mode = OFF;PRAGMA synchronous = 0;PRAGMA cache_size = 1000000;PRAGMA locking_mode = EXCLUSIVE;PRAGMA temp_store = MEMORY;
        ")?;
        // prepare the insert statements
        let mut insert = lit_clock_db.prepare(
            "INSERT INTO littime(time, text, author, title, link) VALUES(?1, ?2, ?3, ?4, ?5)"
        )?;

        // this metadata map is used so we don't query sqlite for each paragraph of a particular book to get its metadata (title and author)
        // it is faster to search the info in this map than in sqlite
        let mut book_metadata_map: HashMap<usize, BookMetadata> = HashMap::new();
        
        // here we go through all the time combinations
        for hour in 1..13 {
            for minute in 0..60 {
                // generate an easily readable index in the database
                let time_number = hour * 100 + minute;
                // generate spoken form of time
                let word_times = all_formats_to_text(hour, minute)?;
                // build a query from the spoken forms we generated
                let mut query_words = "".to_string();
                for (idx, time_variant) in word_times.iter().enumerate() {
                    let mut word_time = time_variant.replace("'", " ");
                    word_time = format!("\"{}\"", word_time);
                    query_words = match idx { 
                        0 => format!("{}", word_time),
                        _ => format!("{} OR {}", query_words, word_time),
                    };
                }
                // generate the find query
                let mut stmt = fts_connection.prepare("SELECT bookid, highlight(book, 1, '<b>', '</b>') FROM book WHERE text MATCH ?1 ")?;
                // get the data 
                let res_iter = stmt.query_map((&query_words,), |row|{
                    Ok(BookFind {
                        book_id: row.get(0)?,
                        text: row.get(1)?,
                    })
                })?;
                // we now iterate through all paragraphs, get their metadata and insert them in our database
                // we speed things up using the hashmap as a cache lookup
                for entry in res_iter {
                    let book_paragraph = entry?;
                    let book_id;
                    let link: String;
                    scan!(book_paragraph.book_id.bytes() => "${}${}$", book_id, link);

                    let mut metadata: Option<&BookMetadata> = None;
                    if let Some(data) = book_metadata_map.get(&book_id) {
                        metadata = Some(data);
                    }
                    else {
                        let query = "SELECT titles.name, authors.name FROM titles, books, authors, book_authors 
                        WHERE books.id = book_authors.bookid AND authors.id = book_authors.authorid AND titles.bookid = books.id AND books.gutenbergbookid = ?1";

                        let res_meta = cache.connection.query_row(query, (book_id,),|row| {
                            Ok( BookMetadata{
                                title: row.get(0)?,
                                author: row.get(1)?,
                            })
                        });

                        if let Ok(data) = res_meta {
                            book_metadata_map.insert(book_id, data);
                            metadata = book_metadata_map.get(&book_id);
                        }
                    }
                    if let Some(data) = metadata {
                        insert.execute((
                            time_number,
                            book_paragraph.text,
                            &data.author,
                            &data.title,
                            link,
                        ))?;
                    }
                }
            }
        }
        // in the end we create an index on littime so we get the result faster when trying to get the time
        // we avoided creating the index at first because that would have slowed down the whole process (sqlite would have needed to update the index on every insert)
        lit_clock_db.execute_batch("CREATE INDEX time_idx ON littime ('time' ASC);")?;
        lit_clock_db.flush_prepared_statement_cache();
    }
    Ok(())

And here is the function that should get the quote based on the time you feed it. The only thing that make it not be a simple “SELECT” is the fact that we also need to handle missing times gracefully. In the case that we don’t have an entry for that exact time we match with closest previous one. To do that we select all distinct times from the db, order them and then do a rfind. Note that this could work better with binary search, but I think performance is not an issue in this case. Note that there is probably a way to do this in SQL with one query, I just didn’t.

pub fn get_lit_clock_data( db_filename:&str, time_now: DateTime<Local>) -> Result<LitClockEntry, Error>
{
    let lit_clock_db = Box::new(Connection::open(db_filename)?);
    let mut m = lit_clock_db.prepare("SELECT distinct(time) as t FROM littime order by t;")?;
    let available_times: Vec<u32> = m
        .query_map((), |row| Ok(row.get::<usize, u32>(0)?))?
        .map(|x| match x {
            Ok(_x) => _x,
            Err(_) => 0,
        })
        .collect();
    let mut number_search = time_now.hour12().1 * 100 + time_now.minute();
    let find_result = available_times.iter().rfind(|&&x| x <= number_search);
    
    if let Some(find) = find_result {
        number_search = *find;
    }

    let mut e = lit_clock_db.prepare("SELECT text, author, title, link, time FROM littime WHERE time = ?1")?;
    let clock_entries_results: Vec<rusqlite::Result<LitClockEntry>> = e.query_map((number_search,), |row| {
        Ok(
            LitClockEntry{
                paragraph: row.get(0)?,
                author: row.get(1)?,
                title: row.get(2)?,
                link: row.get(3)?,
            }
        )
    })?.collect();
    let pick = clock_entries_results.choose(&mut rand::thread_rng());
    if let Some(p) = pick {
        if let Ok(d) = p {
            return Ok(d.clone());
        }
    }
    return Err(Error::InvalidResult("no time".to_string()));
}

Displaying everything via webview

The webview part is easy, you just instantiate the app with a html content that has a javascript function which in turn calls the rust code via an external invoke:

<html>
    <style>
         body {
            font-family: georgia, serif;
            background-color: #222222;
            text-align: center;}
        p    {
            color: #EDCFA9;
            text-align: center;}
        h2   {
            color: #EDCFA9;
        }
        #link,a {
            color: #434242;
            text-align: center;}
        #time {
            color: #D57149;
            text-align: center;}
        #titleauthor {
            color: #D57149}
        #paragraph {
            color: #EDCFA9; font-size:16px;}
    </style>
    <body>
        <h2> Gutenberg Clock</h2>
        <h1 id ="time"> 12: 43</h1>
        <p id = "paragraph">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum</p>
        <div id = "titleauthor">
            <span id ="title"> Title </span> by <span id = "author">Author</span>
        </div>
        <div id = "link">
            <a href="{{link}}"> https://www.raduangelescu.com</a>
        </div>
        <script type ="text/javascript">
            external.invoke("refreshtime");
            setTimeout(function () {
                external.invoke("refreshtime");
            }, "30000")

            function updateData(time, paragraph, title, author, link) {
                document.getElementById('time').innerHTML = time;
                document.getElementById('paragraph').innerHTML = paragraph;
                document.getElementById('title').innerHTML = title;
                document.getElementById('link').innerHTML = link;
                document.getElementById('author').innerHTML = author;
                setTimeout(function() {
                    external.invoke("refreshtime");
                },  "30000")
            }
        </script>
    </body>

</html>

We also style our clock with css so it looks good.

Now the rust part is as follows:

fn show_app(html_content: &str, lit_clock_db:&str) -> Result<(), Error> {
    web_view::builder()
    .title("Gutenberg clock")
    .content(Content::Html(html_content))
    .size(800, 600)
    .resizable(false)
    .debug(true)
    .user_data(())
    .invoke_handler(|_webview, _arg| match _arg {
        "refreshtime" => {
            // get the current local time
            let time_now = Local::now();
            // get the quote
            let rs = get_lit_clock_data(lit_clock_db, time_now);
            if let Ok(r) = rs {
                let time_string = format!("{}:{}", time_now.hour12().1, time_now.minute());
                let mut paragraph = r.paragraph.replace("\"", "'");
                paragraph = paragraph.lines().collect::<Vec<&str>>().join(" ");
                let title = r.title.replace("\"", "'");
                let author = r.author.replace("\"", "'");
                let eval_func = format!(
                    "updateData(\"{}\", \"{}\", \"{}\", \"{}\", \"{}\");",
                    time_string, &paragraph, &title, &author, &r.link
                );
                println!("{}", eval_func);
                // evaluate the javascript function with the parameters
                _webview.eval(eval_func.as_str())?;
            }
            Ok(())
        }
        _ => { 
            unimplemented!();
        }
    })
    .run()
    .unwrap();
    Ok(())
}

And this is about it. The app has some small issues when encountering texts that are incompatible with javascript parameters and I think it may be improved via user data but going forward with webview for this seems useless. Getting the generated sqlite database and using it with a webservice via a real webpage is probably a better idea than improving the webview desktop app.

I hope you enjoyed this tutorial as I surely enjoyed writing it. The whole Rust experience was fun!