|
|
@@ -0,0 +1,57 @@
|
|
|
+# CLAUDE.md
|
|
|
+
|
|
|
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
+
|
|
|
+## Commands
|
|
|
+
|
|
|
+```bash
|
|
|
+# Build
|
|
|
+cargo build
|
|
|
+
|
|
|
+# Run (compress a file)
|
|
|
+cargo run -- <input_file> [-m c]
|
|
|
+
|
|
|
+# Run (extract a file)
|
|
|
+cargo run -- <input_file> -m x
|
|
|
+
|
|
|
+# Run tests
|
|
|
+cargo test
|
|
|
+
|
|
|
+# Run a single test
|
|
|
+cargo test <test_name>
|
|
|
+
|
|
|
+# Run tests in a specific module
|
|
|
+cargo test --lib node::test
|
|
|
+cargo test --lib hufftree::base::test
|
|
|
+cargo test --lib hufftree::canonical_tests
|
|
|
+cargo test --lib storage::test
|
|
|
+```
|
|
|
+
|
|
|
+## Architecture
|
|
|
+
|
|
|
+This is a CLI tool for compressing and decompressing UTF-8 text files using canonical Huffman encoding. The compressed format uses `.z` as the file extension.
|
|
|
+
|
|
|
+**Compression pipeline** (`-m c`):
|
|
|
+1. `hufftree::base::get_char_frequencies` — counts character frequencies in input text
|
|
|
+2. `hufftree::base::Hufftree::new` — builds a Huffman tree using a min-heap (`BinaryHeap<Reverse<Node>>`)
|
|
|
+3. `hufftree::canonical::CanonicalHufftree::from_tree` — converts the base tree into canonical form (codes reassigned by length, then frequency order)
|
|
|
+4. `storage::store_tree_and_text` — writes the compressed file
|
|
|
+
|
|
|
+**Decompression pipeline** (`-m x`):
|
|
|
+1. `storage::read_tree_and_text` — reads the file, reconstructs `CanonicalHufftree::from_vec`, decodes the text
|
|
|
+
|
|
|
+**Binary file format** (defined in `src/storage.rs`):
|
|
|
+```
|
|
|
+4 bytes — total bit length of the remaining data
|
|
|
+n×8 bytes — tree entries: (4 bytes code_length BE) + (4 bytes UTF-8 char)
|
|
|
+4 bytes — delimiter (0xFFFFFFFF)
|
|
|
+m bytes — Huffman-encoded text (padded to byte boundary)
|
|
|
+```
|
|
|
+
|
|
|
+**Key types:**
|
|
|
+- `node::Node` — binary tree node with optional `char` and frequency; ordered by frequency
|
|
|
+- `hufftree::base::Hufftree` — wraps the root `Node` and the character list; used only during compression
|
|
|
+- `hufftree::canonical::CanonicalHufftree` — bidirectional map (`BiMap<char, BitVec>`) for encode/decode; also stores `storage_char_codes: Vec<(char, u32)>` for serialization
|
|
|
+- `cli::Args` / `cli::Mode` — Clap-derived CLI args; mode defaults to `C` (compress)
|
|
|
+
|
|
|
+**Dependencies:** `bit-vec` for `BitVec`, `bimap` for bidirectional char↔code lookup, `clap` for CLI parsing.
|