CLAUDE.md 2.1 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Build
cargo build

# Run (compress a file)
cargo run -- <input_file> [-m c]

# Run (extract a file)
cargo run -- <input_file> -m x

# Run tests
cargo test

# Run a single test
cargo test <test_name>

# Run tests in a specific module
cargo test --lib node::test
cargo test --lib hufftree::base::test
cargo test --lib hufftree::canonical_tests
cargo test --lib storage::test

Architecture

This is a CLI tool for compressing and decompressing UTF-8 text files using canonical Huffman encoding. The compressed format uses .z as the file extension.

Compression pipeline (-m c):

  1. hufftree::base::get_char_frequencies — counts character frequencies in input text
  2. hufftree::base::Hufftree::new — builds a Huffman tree using a min-heap (BinaryHeap<Reverse<Node>>)
  3. hufftree::canonical::CanonicalHufftree::from_tree — converts the base tree into canonical form (codes reassigned by length, then frequency order)
  4. storage::store_tree_and_text — writes the compressed file

Decompression pipeline (-m x):

  1. storage::read_tree_and_text — reads the file, reconstructs CanonicalHufftree::from_vec, decodes the text

Binary file format (defined in src/storage.rs):

4 bytes  — total bit length of the remaining data
n×8 bytes — tree entries: (4 bytes code_length BE) + (4 bytes UTF-8 char)
4 bytes  — delimiter (0xFFFFFFFF)
m bytes  — Huffman-encoded text (padded to byte boundary)

Key types:

  • node::Node — binary tree node with optional char and frequency; ordered by frequency
  • hufftree::base::Hufftree — wraps the root Node and the character list; used only during compression
  • hufftree::canonical::CanonicalHufftree — bidirectional map (BiMap<char, BitVec>) for encode/decode; also stores storage_char_codes: Vec<(char, u32)> for serialization
  • cli::Args / cli::Mode — Clap-derived CLI args; mode defaults to C (compress)

Dependencies: bit-vec for BitVec, bimap for bidirectional char↔code lookup, clap for CLI parsing.