Преглед изворни кода

Fixed single character bug

AvariceLHubris пре 3 недеља
родитељ
комит
e8dfaf6def
2 измењених фајлова са 78 додато и 0 уклоњено
  1. 57 0
      CLAUDE.md
  2. 21 0
      src/hufftree/base.rs

+ 57 - 0
CLAUDE.md

@@ -0,0 +1,57 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Commands
+
+```bash
+# Build
+cargo build
+
+# Run (compress a file)
+cargo run -- <input_file> [-m c]
+
+# Run (extract a file)
+cargo run -- <input_file> -m x
+
+# Run tests
+cargo test
+
+# Run a single test
+cargo test <test_name>
+
+# Run tests in a specific module
+cargo test --lib node::test
+cargo test --lib hufftree::base::test
+cargo test --lib hufftree::canonical_tests
+cargo test --lib storage::test
+```
+
+## Architecture
+
+This is a CLI tool for compressing and decompressing UTF-8 text files using canonical Huffman encoding. The compressed format uses `.z` as the file extension.
+
+**Compression pipeline** (`-m c`):
+1. `hufftree::base::get_char_frequencies` — counts character frequencies in input text
+2. `hufftree::base::Hufftree::new` — builds a Huffman tree using a min-heap (`BinaryHeap<Reverse<Node>>`)
+3. `hufftree::canonical::CanonicalHufftree::from_tree` — converts the base tree into canonical form (codes reassigned by length, then frequency order)
+4. `storage::store_tree_and_text` — writes the compressed file
+
+**Decompression pipeline** (`-m x`):
+1. `storage::read_tree_and_text` — reads the file, reconstructs `CanonicalHufftree::from_vec`, decodes the text
+
+**Binary file format** (defined in `src/storage.rs`):
+```
+4 bytes  — total bit length of the remaining data
+n×8 bytes — tree entries: (4 bytes code_length BE) + (4 bytes UTF-8 char)
+4 bytes  — delimiter (0xFFFFFFFF)
+m bytes  — Huffman-encoded text (padded to byte boundary)
+```
+
+**Key types:**
+- `node::Node` — binary tree node with optional `char` and frequency; ordered by frequency
+- `hufftree::base::Hufftree` — wraps the root `Node` and the character list; used only during compression
+- `hufftree::canonical::CanonicalHufftree` — bidirectional map (`BiMap<char, BitVec>`) for encode/decode; also stores `storage_char_codes: Vec<(char, u32)>` for serialization
+- `cli::Args` / `cli::Mode` — Clap-derived CLI args; mode defaults to `C` (compress)
+
+**Dependencies:** `bit-vec` for `BitVec`, `bimap` for bidirectional char↔code lookup, `clap` for CLI parsing.

+ 21 - 0
src/hufftree/base.rs

@@ -44,6 +44,14 @@ impl Hufftree {
     pub fn get_character_code(&self, character: char) -> Result<BitVec, &str> {
         match self.root.get_character_code(character) {
             Ok(code) => {
+                if code.is_empty() {
+                    // Single-character alphabet: the root is the only node, so the
+                    // recursive walk returns no bits. Assign a 1-bit code so the
+                    // encoded text is non-empty and decodable.
+                    let mut single = BitVec::new();
+                    single.push(false);
+                    return Ok(single);
+                }
                 let code = code.iter().rev().collect();
                 Ok(code)
             }
@@ -116,6 +124,19 @@ mod test {
         assert_eq!(c_code.to_string(), "00");
     }
 
+    #[test]
+    fn single_character_alphabet_roundtrip() {
+        let mut chars_and_freq: HashMap<char, i32> = HashMap::new();
+        chars_and_freq.insert('a', 5);
+
+        let huff = Hufftree::new(chars_and_freq);
+        let code = huff.get_character_code('a').unwrap();
+        assert_eq!(code.len(), 1);
+
+        let encoded = huff.convert_text("aaaaa".to_string()).unwrap();
+        assert_eq!(encoded.len(), 5);
+    }
+
     #[test]
     fn get_charcter_freq() {
         let text = String::from("aaaaaabb cc");