OBJECTIVE: Evaluations of vision prostheses and sensory substitution devices have frequently relied on repeated training and then testing with the same small set of items. These multiple forced-choice tasks produced above chance performance in blind users, but it is unclear if the observed performance represents restoration of vision that transfers to novel, untrained items. APPROACH: Here, we tested the generalizability of the forced-choice paradigm on discrimination of low-resolution word images. Extensive visual training was conducted with the same 10 words used in previous BrainPort tongue stimulation studies. The performance on these 10 words and an additional 50 words was measured before and after the training sessions. MAIN RESULTS: The results revealed minimal performance improvement with the untrained words, demonstrating instead pattern discrimination limited mostly to the trained words. SIGNIFICANCE: These findings highlight the need to reconsider current evaluation practices, in particular, the use of forced-choice paradigms with a few highly trained items. While appropriate for measuring the performance thresholds in acuity or contrast sensitivity of a functioning visual system, performance on such tasks cannot be taken to indicate restored spatial pattern vision.