Midway through January, 2019, news broke of a large cache of emails and passwords, dubbed “Collection #1”, surfacing on the internet. Troy Hunt writes that it includes 772,904,991 unique email addresses and 21,222,975 unique passwords. Wow! Hunt cleaned and loaded the data into his service called HaveIBeenPwned, which allows (non-technical) users to enter either their email or password(s) to see if their data was included in the breach (or earlier breaches that he’s cataloged). Hunt’s total list of passwords now includes more than half a billion unique passwords.

While I’m reasonably OK entering my email address into the HaveIBeenPwned website, I was a bit more skeptical about entering any of my passwords. Hunt employs something called k-anonymity to make the password search safer. You can check your passwords this way either by manually entering it in a webpage, or via a lovely little API. If using the API, users only need to send the first 5 characters of the SHA-1 hash of their password over the internet. The API then returns all the hashes that have that 5-character prefix, and the user does the rest of the work.

Here’s a video that does a good job explaining k-anonymity:

Note: For my project, I copied most of this HIBP Password API code from David Hewitt’s Password Check.

But what if we’re too paranoid for this type of check?

An offline checking option

In addition to this Passwords API, Hunt makes the (very large) text file of the half-billion password list available to the general public for download, either as a torrent or hosted by Cloudflare. Here are the first 10 lines of the 550-million-line text file (the number after the colon is the number of times that password has appeared in the various breach lists):

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079
6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
8CB2237D0679CA88DB6464EAC60DA96345513964:2333232

You might be asking: But you said this file had passwords in it. In actuality the file contains hash digests of the passwords, followed by the number of times each password appears in all the collected breaches (the number after the colon).

What’s a hash and a hash digest? Here’s a good explainer video, another video, and here’s the Wikipedia page. But basically a hash is way to represent a piece of data (in this case, a password) without actually revealing it (though it is “guessable”). An example: If we run the word “password” through the SHA-1 hash, we get a “digest” of 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8 (which was the 4th most common password in the breach with a whopping 3,645,804 appearances).

To generate this “hash digest”, we run a numeric representation of the word “password” through the Secure Hash Algorithm 1, aka SHA-1. SHA-1, which was designed by the NSA, is an example of a one-way mathematical function, which Wikipedia defines:

In computer science, a one-way function is a function that is easy to compute on every input, but hard to invert given the image of a random input. Here, “easy” and “hard” are to be understood in the sense of computational complexity theory, specifically the theory of polynomial time problems. Not being one-to-one is not considered sufficient of a function for it to be called one-way.

You can generate your own SHA-1 hash digests here, though I would not recommend typing your real passwords into that website.

If that went a bit over your head don’t worry too much about it for now – we’ll press on.

What I wanted to do

After the recent breach, I was curious to check my passwords against the list, but I’m a bit paranoid, so, rather than paste my passwords into the Have I Been Pwned website or use the API that uses k-anonymity, I wanted to chose a third option: download the big text file and check my passwords against it offline, nice and safely.

I use a password manager called KeePassXC, so all of my passwords are stored in an encrypted file – a KeePass database – and I use a program called KeePassXC, a free and open-source password manager, to manage them (I wrote a beginner’s user guide to KeePassXC a while back if you’re interested!). So ideally, to check my passwords against the big list, I’d have a tool that checks all the passwords in a given KeePass database against the entire HaveIBeenPwned list of passwords, preferably against the downloaded file (i.e. “offline”), rather than the API. In other words something similar to 1Password’s Watchtower feature, but preferably offline.

After poking around a bit I decided to write it myself in Rust, with this script and this crate as useful references.

What I wrote

Medic is a Rust CLI that can perform a variety of “health” checks on a KeePass database. It works! But nevertheless I’m going to give a big ol’ “buyer beware” on it – I am a social media producer by trade, and have never written software that deals so directly with sensitive data before. (Of course, part of the promise of Rust is that it enables more people to write “safe” code, so this was a natural challenge for me.)

Medic can check the passwords of a given KeePass database in four ways:

  1. Check passwords against the HaveIBeenPwned password database, via the HaveIBeenPwned API
  2. Check passwords against a file of password hashes. This requires users to download a large list of SHA-1 hashes of breached or compromised passwords. I tailored it to work with the Pwned Passwords lists from HaveIBeenPwned, which anyone can download here. Medic will then display a list of any passwords from the given KeePass database that also appear in the list of breached passwords.
  3. Check for weak passwords, using zxcvbn
  4. Check for duplicate passwords

For more on usage and setup, it’s best to refer to the ReadMe. In this post I’m going to go over some of my Rust code and what it does.

But is this useful?

I have run my personal KeePass database through the program (actually I exported my database to a CSV file and ran that through the program – I outline the steps I actually took in the README). It found a few old passwords (that I was still using) among the HIBP list, though I should note that I did not find any of my passwords that I used KeePassXC (or 1Password, which I sometimes use for mobile) to randomly generate on the breach list.

There is an open question of the usefulness of this kind of check – the logic of the question is something like “Wouldn’t a KeePass user not have any common passwords, thus a check against a breach list like this be pointless?” My tentative answer is that even the strongest of passwords can be exposed. Sure, it may only appear on the list once, but it’s still there, and a tool like Medic is one of the safer yet moderately efficient ways I can think of to learn if that’s true or not for any of your passwords. Plus, as mentioned, 1Password has a similar tool called Watchtower.

Reading a KeePassXC database

With the keepass-rs crate loaded up, unlocking and reading the entries of KeePass database was pretty easy. Below is an early version of a function that did just that:

fn get_entries_from_keepass_db(file_path: &str) -> Vec<Entry> {
    let mut entries: Vec<Entry> = vec![];

    let db_pass =
        rpassword::read_password_from_tty(Some("Enter the password to your KeePass database: "))
        .unwrap();
    // Open KeePass database
    println!("Attempting to unlock your KeePass database...");
    let db = match File::open(std::path::Path::new(file_path))
        .map_err(OpenDBError::Io)
        .and_then(|mut db_file| Database::open(&mut db_file, &db_pass))
        {
            Ok(db) => db,
                Err(e) => panic!("Error: {}", e),
        };

    println!("Reading your KeePass database...");
    // Iterate over all Nodes
    for node in &db.root {
        match node {
            Node::Entry(e) => {
                let this_entry = Entry {
                    title: e.get_title().unwrap().to_string(),
                    username: e.get_username().unwrap().to_string(),
                    pass: e.get_password().unwrap().to_string(),
                    digest: sha1::Sha1::from(e.get_password().unwrap().to_string())
                       .digest()
                       .to_string()
                       .to_uppercase(),
                };
                entries.push(this_entry);
            }
        }
    }
    entries
}

Once the entry’s data is exposed (in this case, in e) I “build” a new Entry struct called this_entry – we use some emthods laid out in the keepass-rs documentation to get at the title, username and password. Then we use the password again to create a SHA-1 hash digest that we’ll also need later. Once the entry is built I push the entry into a Vector simply called entries.

For reference, here’s the definition of the Entry struct:

#[derive(Debug, Clone)]
pub struct Entry {
    title: String,
    url: String,
    username: String,
    pass: String,
    digest: String,
}

Believe it or not, the above code represents an early, partially simplified version. Below I discuss how the current version of Medic handles this task (the Entry struct definition remained unchanged though).

How I read in a KeePass database in the current version of Medic

As I wrote more code and added more features to the program – including the ability to take a keyfile or a CSV export of the user’s database, I broke this “Read KeePass database entries into Vector of Structs” task into a couple of functions, all located in src/lib.rs

pub fn get_entries(file_path: PathBuf, keyfile_path: Option<PathBuf>) -> Vec<Entry> {
    let file_extension = file_path.extension().unwrap().to_str().unwrap();

    let db_pass: Option<String> = if file_extension != "csv" {
        Some(
            rpassword::read_password_from_tty(Some(
                "Enter the password to your KeePass database: ",
            ))
            .unwrap(),
        )
    } else {
        None
    };

    if file_extension != "csv" && db_pass.is_some() {
        build_entries_from_keepass_db(file_path, db_pass.unwrap(), keyfile_path)
    } else {
        build_entries_from_csv(file_path)
}

fn build_entries_from_keepass_db(
        file_path: PathBuf,
        db_pass: String,
        keyfile_path: Option<PathBuf>,
        ) -> Vec<Entry> {
    let mut entries: Vec<Entry> = vec![];

    println!("Attempting to unlock your KeePass database...");
    let db = unlock_keepass_database(file_path, db_pass, keyfile_path);
    // Iterate over all Groups and Nodes
    for node in &db.root {
        match node {
            Node::GroupNode(_g) => {
                // println!("Saw group '{}'", g.name);
            }
            Node::EntryNode(e) => {
                let this_entry = Entry {
title: e.get_title().unwrap().to_string(),
           username: e.get_username().unwrap().to_string(),
           url: e.get("URL").unwrap().to_string(),
           pass: e.get_password().unwrap().to_string(),
           digest: sha1::Sha1::from(e.get_password().unwrap().to_string())
               .digest()
               .to_string()
               .to_uppercase(),
                };
                if this_entry.pass != "" {
                    entries.push(this_entry);
                }
            }
        }
    }
    println!("Successfully read KeePass database!");
    entries
}

// helper function that does the actual unlocking
fn unlock_keepass_database(
        path: PathBuf,
        db_pass: String,
        keyfile_path: Option<PathBuf>,
        ) -> keepass::Database {
    let mut keyfile = keyfile_path.map(|kfp| File::open(kfp).unwrap());

    match Database::open(
            &mut File::open(path).unwrap(),               // the database
            Some(&db_pass),                               // password
            keyfile.as_mut().map(|f| f as &mut dyn Read), // keyfile
            ) {
        Ok(db) => db,
        Err(e) => {
            panic!("\nError opening database: {}", e);
        }
    }
}

When I started this project the keepass-rs crate wasn’t able to unlock KeePass databases that were locked with keyfiles. Feeling lucky, I opened an issue on the crate’s GitHub repo and just hours later the maintainer responded. The next day he wrote the feature into the crate – basically Database::open takes an optionally third argument – , and I bumped the version in my Cargo.toml to 0.3.1. Open source ftw!

At this point I could read the entries of a KeePass database into a Vector of custom-made Entry objects (even if the KeePass database was locked with a keyfile). Sweet! Onward!

Using the HaveIBeenPwned API to offer users an online check

As mentioned above, HIBP offers an API To check passwords. You’re welcome to checkout the code to the API/”online” check, but again, I mostly copied it from this very clean project. Hope that’s cool, David!

The offline check: Splitting the work into chunks

Once I successfully downloaded the 11 GB file of passwords and extracted it to a usable txt file (22.6 GB), I was able to begin work on the offline check.

Here are the first 10 lines:

7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220
7C222FB2927D828AF22F592134E8932480637C0D:2889079
6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058
20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157
E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761
8CB2237D0679CA88DB6464EAC60DA96345513964:2333232

The format is <SHA-1 hash of the password> and then a colon, then the number of times that particular password appeared in the various breaches (a number I didn’t have much interest in). The last 10 lines are very similar to the first 10, except that all of the number-of-appearances are 1.

Troy Hunt explains why he uses SHA-1 for this project:

Each of the… passwords is being provided as a SHA1 hash. What this means is that anyone using this data can take a plain text password from their end (for example during registration, password change or at login), hash it with SHA1 and see if it’s previously been leaked. It doesn’t matter that SHA1 is a fast algorithm unsuitable for storing your customers’ passwords with because that’s not what we’re doing here, it’s simply about ensuring the source passwords are not immediately visible.

Back to the code

So, to review, our task is to open up a KeePass database (see above), make SHA-1 hashes (sometimes referred to more specifically as “digests”) of all the passwords (I do this when I build the Entry objects), then see if they appear in this 550-million-line text document.

My first problem: figuring out how to read this massive amount of data in to my Rust program in order to work with it.

In my first attempt, I brazenly tried to read all 550 million hashes into a single, massive Vector.

fn read_by_line(file_path: &str) -> io::Result<Vec<String>> {
    let mut vec = Vec::new();
    let f = match File::open(file_path.trim_matches(|c| c == '\'' || c == ' ')) {
        Ok(res) => res,
            Err(e) => return Err(e),
    };
    let file = BufReader::new(&f);
    let mut line_number = 0;
    for line in file.lines() {
        line_number = line_number + 1;
        println!("Reading line #{:?}", line_number);
        vec.push(line.unwrap());
    }
    Ok(vec)
}

As you might guess, this slowly but surely tried to fill up 22.6 GB of my RAM. (While I have 32 GB on this machine (rah!), this obviously wasn’t a viable method.)

I had a few ideas of how to solve this issue, but a Fediverse friend helped me decide to split the data into chunks. After some testing I did later, I found that 10 million lines per chunk was a pretty good size.

fn check_database_offline(passwords_file_path: &str, entries: Vec<Entry>) -> io::Result<Vec<Entry>> {
  let mut this_chunk = Vec::new();
  let mut breached_entries: Vec<Entry> = Vec::new();

  let f = match File::open(passwords_file_path) {
    Ok(res) => res,
      Err(e) => return Err(e),
  };

  // times via `cargo test --release can_check_offline --no-run && time cargo test --release can_check_offline -- --nocapture`
  // let chunk_size = 1_000_000; // real 1m24.709s
  // let chunk_size = 20_000_000; // real 1m13.159s
  let chunk_size = 10_000_000; // real 1m14.613s

  let file = BufReader::new(&f);
  for line in file.lines() {
    this_chunk.push(line.unwrap());
    if this_chunk.len() > chunk_size {
      match check_this_chunk(&entries, &this_chunk) {
        Ok(mut vec_of_breached_entries) => {
          breached_entries.append(&mut vec_of_breached_entries)
        }
        Err(_e) => eprintln!("found no breached entries in this chunk"),
      }
      this_chunk.clear();
    }
  }
  Ok(breached_entries)
}

The for line in files.lines() loop loops through each line of the file, pushing them into this_chunk. When the len of the chunk gets larger than the designated chunk_size, it dips into that match statement, which sends this_chunk, as well as a reference to the users entries, to another function called check_this_chunk. If check_this_chunk finds any passwords from your KeePass entries in that chunk, it returns them in a Result, then appends them to a Vector called breached_entries (if no matches, does nothing). Next, crucially it clears this_chunk – removing all the digests from memory – before continuing through the big file.

Then it moves on to building the next chunk.

The result is that we never have more than 10 million hash digests stored in RAM at one time.

This works pretty well! Running rustc in release mode, it takes about 70 seconds to check a test KeePass database of 5 or so entries. To check my ~80-entry database, it was more like 2 minutes. Not terrible!

Doing the checking

Once I got this chunk thing figured out, doing the actual hash digest comparisons was relatively easier – I used nested for loops.

fn check_this_chunk(entries: &[Entry], chunk: &[String]) -> io::Result<Vec<Entry>> {
    let mut breached_entries = Vec::new();

    for line in chunk {
        let this_hash = &line[..40];
        for entry in entries {
            if this_hash == entry.digest {
                breached_entries.push(entry.clone());
            }
        }
    }
    Ok(breached_entries)
}

I made it a little easier on myself here and used clone() so that I wouldn’t have to worry as much about ownership. I figured this was OK resource-wise, since (hopefully) no one’s going to have thousands or even hundreds of breached entries in a single KeePass database.

In an effort to speed up this check, I did experiment a little with Rust’s HashSet collection type, but – the way I implemented them at least – it wasn’t noticeably faster. Could be a good future project though, along with adding threading.

Adding a progress bar and learning that line counting is pretty slow?!

While this offline check worked reasonably well, it still took 1 to 3 minutes to complete. In my previous experience, if you work hard enough you can speed Rust up significantly, but I figured before banging my head on that it’d be fun to see if I could implement a progress bar.

I found two crates for just this: pb and indicatif. I found indicatif’s API a little more sensible, and it allowed for colors, so I went with that (though pb seems more lightweight).

GIF of an example progress bar

I hit a sticking point here though: both progress bar libraries require you to set some value for when the progress bar should be done. For example, if you’re cranking through 550M lines, you might want to set this value to 550_000_000, and you’d call pb.inc(chunk_size as u64) each time you finished checking a chunk. Obviously hardcoding the 550_000_000 is one option here, but it’s not ideal – what if the file HaveIBeenPwned offers gets larger over time?

So I tried having Rust count the lines of the inputed file – something like f.lines().unwrap().len() – but this takes a pretty long time for a 550-million-line text file – like a minute at least. Since I was only doing this for the progress bar, that was unacceptable. But rather than ditch the progress bar or resort to hard-coding a value, I ended up using f.metadata.unwrap().len() to get the number of bytes of the file. This call is much quicker, which kind of makes sense intuitively?

Counting by bytes, rather than lines

My next problem was I had to estimate the size of each chunk in bytes rather than in numbers of lines. 500 MB felt like a good RAM requirement for the user – I felt OK hard-coding that value. But now I had to figure out how to check the byte size of a Rust Vector as I added to it.

I kind of forget how I did this – maybe std::mem::size_of – but by my first measure I found that one line of the text file was 24 bytes. But when I tried using this value, the progress bar would get to 50% and the process would end, so I just doubled it to 48 and went with that. Again, if you have a more scientific way of correctly reading the size of a Rust String stored in a Vector, let me know.

Anyway, here’s where I landed:

pub fn check_database_offline(
        passwords_file_path: &str,
        entries: Vec<Entry>,
        progress_bar: bool,
        ) -> io::Result<Vec<Entry>> {
    let mut this_chunk = Vec::new();
    let mut breached_entries: Vec<Entry> = Vec::new();

    let f = match File::open(passwords_file_path) {
        Ok(res) => res,
            Err(e) => return Err(e),
    };
    let passwords_file_size = f.metadata().unwrap().len() as usize;

    let chunk_size = 500_000_000; // real 1m7.686s

    let pb = ProgressBar::new(passwords_file_size as u64);
    if progress_bar {
        pb.set_style(
                ProgressStyle::default_bar()
                .template("{spinner} [{elapsed_precise}] [{bar:40}] ({eta})"),
                );
    }

    let file = BufReader::new(&f);
    for line in file.lines() {
        let this_line = line.unwrap()[..40].to_string();
        this_chunk.push(this_line);
        if this_chunk.len() * 48 > chunk_size {
            match check_this_chunk(&entries, &this_chunk) {
                Ok(mut vec_of_breached_entries) => {
                    breached_entries.append(&mut vec_of_breached_entries)
                }
                Err(_e) => eprintln!("found no breached entries in this chunk"),
            }
            if progress_bar {
                pb.inc(chunk_size as u64);
            }
            this_chunk.clear();
        }
    }
    if progress_bar {
        pb.finish_with_message("Done.");
    }
    Ok(breached_entries)
}

Checking for weak passwords with zxcvbn

Next, since I had already done all the work to read in a user’s passwords, I figured I might as well build out an option to check their database for weak passwords.

Troy Hunt actually suggests doing this in his original blog post about the password list in conjunction with a HaveIBeenPwned check – he recommends using zxcvbn, a “low-budget password strength checker” from Dropbox.

Luckily, a few months ago I used Rust to make this little password checker that uses a Rust port of zxcvbn. The Rust port is pretty straight-forward to use. Here are the two functions I needed:

pub fn check_for_and_display_weak_passwords(entries: &[Entry]) {
    for entry in entries {
        let estimate = zxcvbn(&entry.pass, &[&entry.title, &entry.username]).unwrap();
        // estimate.score gives is a 0 to 4 score of the password
        if estimate.score < 4 {
            println!("Your password for {} is weak.", entry);
            give_feedback(estimate.feedback);
            println!("\n--------------------------------");
        }
    }
}

fn give_feedback(feedback: Option<zxcvbn::feedback::Feedback>) {
    match feedback {
        Some(feedback) => {
            if let Some(warning) = feedback.warning {
                println!("Warning: {}\n", warning);
            }
            println!("Suggestions:");
            for suggestion in feedback.suggestions {
                println!("   - {}", suggestion)
            }
        }
        None => println!("No suggestions."),
    }
}

Two cool things here: First, that second argument to zxcvbn is a Vector of related words, like username or service title, to assist zxcvbn in estimating the password’s strength. For example, github_rocks123 might be an OK password for Tumblr, but it’s a worse choice for GitHub.

Second, the function returns “feedback”, which I wrote a helper function to display. Examples of feedback are like, “Don’t use dates”.

Using a HashMap to find and organize duplicate passwords

Next, I figured I’d offer the option of finding re-used or duplicate passwords. To do this, I used a Rust HashMap and this neat entry/and_modify/or_insert pattern I’ve used before that’s great for counting things.

To be a little safer, I decided to use the password hash digests for the keys of the HashMap (called digest_map below). The values are a Vector (group) of all the entries whose password digest is the same. So if none of your KeePass database’s entries share passwords, all of the keys of this HashMap will only have one Entry in their attached Vectors. But if, say, three entries share a password, the digest of that password will be a key and its value will be a Vector with 3 Entries in it.

pub fn make_digest_map(entries: &[Entry]) -> io::Result<HashMap<String, Vec<Entry>>> {
    let mut digest_map: HashMap<String, Vec<Entry>> = HashMap::new();
    for entry in entries {
        digest_map
            .entry(entry.clone().digest)
            .and_modify(|vec| vec.push(entry.clone()))
            .or_insert_with(|| vec![entry.clone()]);
    }

    Ok(digest_map)
}

// Clippy told me "warning: parameter of type `HashMap` should be generalized over different hashers"
pub fn present_duplicated_entries<S: ::std::hash::BuildHasher>(
        digest_map: HashMap<String, Vec<Entry>, S>,
        ) {
    let mut has_duplicated_entries = false;
    for group in digest_map.values() {
        // print if there is more than 1 element in the vector, since that represents a repeated password
        if group.len() > 1 {
            println!("The following entries have the same password:\n");
            for entry in group {
                println!("   - {}", entry);
            }
            has_duplicated_entries = true;
        }
    }

    if has_duplicated_entries {
        println!("\nPassword re-use is bad. Change passwords until you have no duplicates.");
    } else {
        println!("\nGood job -- no password reuse detected!");
    }
}

Gonna be straight with you: I’m not 100% sure about that .or_insert_with(|| vec![entry.clone()]); line – it was a Clippy suggestion. I had a simple or_insert call.

Making it a grown-up Rust CLI with structopt

In earlier versions, Medic presented users with a menu and asked for a numerical choice input.

To check your KeePass database's passwords, do you want to:

==> 1. Check for weak passwords
==> 2. Check for duplicate passwords
==> 3. Check OFFLINE for breached passwords: Give me a database of SHA-1 hashed passwords to check your KeePass database against
==> 4. Check ONLINE for breached passwords: I will hash your passwords and send the first 5 characters of each hash over the internet to HaveIBeenPwned, in order to check if they've been breached.

But as I learned more about Rust command line interfaces, I saw there are a handful of command line argument parsers that people use to create their CLIs, which gives the ecosystem of Rust CLIs a nice uniformity. These parsers include clap and structopt, among others. I decided to go with structopt.

structopt requires you to define a struct called Opt in src/main.rs, in which we outline the options, flags, and arguments that our CLI will take. We can also provide documentation notes (denoted by the triple ///) that will populate the --help output.

// excerpt from src/main.rs
extern crate structopt;
// ...
use std::path::PathBuf;
use structopt::StructOpt;

/// Medic
#[derive(StructOpt, Debug)]
#[structopt(name = "medic")]
struct Opt {
    /// Give verbose output
    #[structopt(short = "v", long = "verbose")]
    verbose: bool,

    /// Provide key file, if unlocking the KeePass databases requires one
    #[structopt(short = "k", long = "keyfile", parse(from_os_str))]
    keyfile: Option<PathBuf>,

    /// Check passwords against breached passwords online via the HaveIBeenPwned API. More info
    /// here:
    /// https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/#cloudflareprivacyandkanonymity
    #[structopt(long = "online")]
    online: bool,

    /// Provide password hash file to check database against. To download a copy of very large list of
    /// password hashes from HaveIBeenPwned, go to: https://haveibeenpwned.com/Passwords
    #[structopt(short = "h", long = "hashfile", parse(from_os_str))]
    hash_file: Option<PathBuf>,

    /// Check database for duplicate passwords
    #[structopt(short = "d", long = "duplicate")]
    check_duplicate: bool,

    /// Check database for weak passwords
    #[structopt(short = "w", long = "weak")]
    check_weak: bool,

    /// KeePass database to check. Can either be a kdbx file or an exported CSV version of a
    /// KeePass database.
    #[structopt(name = "KEEPASS DATABASE FILE", parse(from_os_str))]
    keepass_db: PathBuf,
}

We can access the values of these arguments by using Opt::from_args() like so:

// still in src/main.rs
fn main() {
    let opt = Opt::from_args();
    println!("Detected option(s):\n{:?}", opt);

    // and if we like, we can stash some of them in a fresh set of variables
    let keepass_db_file_path = opt.keepass_db;
    let hash_file: Option<PathBuf> = opt.hash_file;
    let keyfile: Option<PathBuf> = opt.keyfile;
    let check_online = opt.online;
    // ...
}

This solved two other problems for me: (1) It gave me a way to give the user an optional method of providing a keyfile. (Not every KeePass database requires a keyfile to open it.)

And (2), the file paths we ask the user for are already of type PathBuf rather than Strings or string slices.

Replacing a bool with baby’s first enum

In previous versions of Medic, the function check_database_offline had a bool value for whether to display a progress bar or not. The primary reason for this was so I could turn it off for testing.

This worked fine, but it didn’t look great. For example, the actual, non-test call for the function looked like this

println!("Checking KeePass database against provided hash file");
let breached_entries = check_database_offline(file_path, &entries, true).unwrap();

What’s that true doing hanging out there!? What’s set to true?

Luckily, I happened to stumble upon this wonderfully short blog post called “Rust Patterns: Enums Instead of Booleans”. As the title implies, the author advises that we can often be more descriptive if we use enums rather than booleans.

With this lesson fresh in mind, I set about to refactor the “progress bar” bool into an enum (and rename it to progress_bar_visibility

#[derive(Debug, PartialEq)]
pub enum VisibilityPreference {
    Show,
    Hide,
}

pub fn check_database_offline(
    passwords_file_path: PathBuf,
    entries: &[Entry],
    progress_bar_visibility: VisibilityPreference,
    ) -> io::Result<Vec<Entry>> {

        //...

        let pb = ProgressBar::new(passwords_file_size as u64);
        if progress_bar_visibility == VisibilityPreference::Show {
            pb.set_style(
                    ProgressStyle::default_bar()
                    .template("{spinner} [{elapsed_precise}] [{bar:40}] ({eta})"),
                    );
        }

        // ...
    }

Now, calling the function looks much more descriptive:

// show progress bar in production
let breached_entries = check_database_offline(file_path, &entries, VisibilityPreference::Show).unwrap();

// hide progress bar for test functions
let breached_entries = check_database_offline(passwords_file_path, &entries, VisibilityPreference::Hide).unwrap();

Attempting to detect similar passwords using zxcvbn

While checking for repeated passwords is nice and potentially useful, I thought it’d be cool if the tool could also detect passwords that are similar as well, for example Spot34 and Spot43. I hoped that I could use that feature of zxcvbn where it optionally takes related words to accomplish this: I’d just shove all the other passwords into that Vector and see if the password still got a high-enough score.

This worked OK if the differences in the passwords where capitalization, for example Spot34 gets a lower score if you submit spot34 as a related word. But sadly it seems not to do anything with numbers: Spot34 doesn’t get a lower score if you submit Spot43 along with it.

So, for now, I’ve scrapped this feature. If you have any ideas on how to better implement it, let me know!

Learning about is_some()

A small Rust thing I learned: When I implemented the keyfile functionality mentioned above, I was sure to make keyfile_path an Option, specifically Option<&str>, since not all KeePass databases are going to require key files to be unlocked.

At some point I need some control flow on whether this keyfile_path variable was present (a Some) or a None. My Ruby instincts told me I’d be able to just write if keyfile_path or if Some(keyfile_path) and, in that context, Rust would evaluate the variable as a Boolean. Of course Rust is too strict to allow this. But I did learn about the is_some() method. Here’s how that would look with keyfile_path:

if keyfile_path.is_some() {
    match Database::open(
            &mut File::open(path).unwrap(), // the database
            Some(&db_pass),                 // password
            Some(&mut File::open(std::path::Path::new(keyfile_path.unwrap())).unwrap()), // keyfile
            ) {
        Ok(db) => db,
        Err(e) => panic!("Error opening database: {}", e),
    }
} else {
    match Database::open(
            &mut File::open(path).unwrap(), // the database
            Some(&db_pass),                 // password
            None,                           // keyfile
            ) {
        Ok(db) => db,
        Err(_e) => {
            println!("\nError opening database. Maybe you have a keyfile? If so, enter its file path:");
            let keyfile_path = get_file_path().unwrap();
            unlock_keepass_database(file_path, db_pass, Some(&keyfile_path))
        }
    }
}

The related, more elegant solution here is Rust’s if let. This allows us to safely “unwrap” the key file path variable without calling unwrap() (yay).

fn unlock_keepass_database(
        file_path: &str,
        db_pass: String,
        keyfile_path: Option<&str>,
        ) -> keepass::Database {
    let path = std::path::Path::new(file_path);

    if let Some(keyf_path) = keyfile_path {
        match Database::open(
                &mut File::open(path).unwrap(), // the database
                Some(&db_pass),                 // password
                Some(&mut File::open(std::path::Path::new(keyf_path)).unwrap()), // keyfile
                ) {
            Ok(db) => db,
            Err(e) => panic!("Error opening database: {}", e),
        }
    } else {
        match Database::open(
                &mut File::open(path).unwrap(), // the database
                Some(&db_pass),                 // password
                None,                           // keyfile
                ) {
            Ok(db) => db,
            Err(_e) => {
                println!("\nError opening database. Maybe you have a keyfile? If so, enter its file path:");
                let keyfile_path = get_file_path().unwrap();
                unlock_keepass_database(file_path, db_pass, Some(&keyfile_path))
            }
        }
    }
}

Calling map on an Option

Thanks to a comment on this post by Frederic Dumont (thanks!), I learned that I could refactor this unlock_keepass_database a bit further. To be honest, I’m not super sure how this works, but clearly the two calls to map on Options do some solid work, which works on keyfile whether it’s a None or Some. We’re also introducing a call to dyn, which is some sort of way to use a trait object.

fn unlock_keepass_database(
        path: PathBuf,
        db_pass: String,
        keyfile_path: Option<PathBuf>,
        ) -> keepass::Database {
    let mut keyfile = keyfile_path.map(|kfp| File::open(kfp).unwrap());

    match Database::open(
            &mut File::open(path).unwrap(),               // the database
            Some(&db_pass),                               // password
            keyfile.as_mut().map(|f| f as &mut dyn Read), // keyfile
            ) {
        Ok(db) => db,
        Err(e) => {
            panic!("\nError opening database: {}", e);
        }
    }
}

So I don’t love introducing multiple lines that I don’t understand well, but this version is so much more concise I decided to implement it.

Epilogue: Collections #2 through #5

On January 31, 2019, Collections #2 through #5 dropped. Wired reports:

The new Collection leak, which was first reported by Heise, contains 2.2 billion unique usernames and passwords. In total it contains 845GB of data and more than 25bn records.

I’m not sure how to get my hands on this dump, but I’d be curious to see how Medic performs with such a long list. Maybe a few hours? Could be good inspiration to squeeze more efficiency out of the offline check function, and/or implement threading…