mirror of
https://github.com/RfidResearchGroup/proxmark3.git
synced 2026-05-14 11:55:04 +00:00
8d6e474a75
Imported updates from legbrute hashcat modules to speed up hf iclass legbrute 1. Bitslicing — biggest win (~32×) Kernel stores each cipher register (l, r, b, t) as 32 parallel 1-bit lanes in u32s (m64000_a3-pure.cl:147-201, bs_iclass_tick) and computes 32 MACs per tick. The CPU doMAC_brute does 1 at a time. A 64-bit bitslice port would give ~64× per core; AVX2 gets 256×. This is the single largest speedup lever. 2. Early-reject after 8 output ticks In m64000_sxx (m64000_a3-pure.cl:534-535) the kernel breaks out as soon as the first MAC byte can't match. doMAC_brute always produces the full 32 output bits before memcmp. Comparing byte-by-byte as bits are produced saves ~3× on the output phase since 255/256 keys fail after byte 0. 3. Pre-expanded y_ccnr bit array Kernel expands the 96 input bits into a flat array once (m64000_a3-pure.cl:239-247) and reuses it for every candidate. suc_bytes in cipher.c:181 re-does b >>= 1 shifts for every key. Pre-expanding lets the inner loop be branch-free and vectorizable. 4. Widen lanes to 256/512 via AVX2/AVX-512. The bitslice code is written against a single uint64_t lane type — swapping for __m256i/__m512i (or an abstracted bs_word_t) gives 4×/8× throughput on hosts that support it, with scalar u64 fallback on ARM/older x86. NEON gives 2× for ARM.