A dip in the 3 kHz to 5kHz range has been common for a long time. I think it’s because that is a very sensitive (and therefore fatigue-inducing) range for the human ear. I think (probably related) it’s around the resonant frequency for most human ear canals.
A couple examples from a long time ago, both from Stereophile measurements:
Fig.3 Stirling Broadcast LS3/5a V2, anechoic response on drive-unit axis at 50", averaged across 30° horizontal window and corrected for microphone response (red trace), with the similarly derived responses of a 1996 KEF LS3/5a sample (blue, offset by –5dB) and a 1978 Rogers LS3/5a sample (green, offset by –10dB).
Fig.7 Quad ESL-63, anechoic response on tweeter axis at 1m.
The LS3/5a variants and the Quad ESL-63 are pretty well-known as speakers that get the midrange right (whatever their other faults may be), and they all have a dip in that region.
I see the logic in @Resolve’s desire to EQ from flat, but I think I’d rather EQ from perceptually natural (and objectively not flat) than objectively flat (and perceptually bright). YMMV of course.

