With raw data, a neutral colour can be recorded as any ratio of RGGB components, for natural un-filtered light sources those ratios are not 1:1:1:1. With colour-neutral (working) RGB spaces, like ProPhoto, Adobe, sRGB, etc, - when the ratio of RGB components is 1:1:1 it means neutrality.
Maximum white is really 'maximum diffusely reflected white', which is the best that could be done until recently with typical monitor contrast ratios of 6 to 8 stops - without having to distort tones wildly in the process. If you had to give some up, might as well be mixed and specular highlights.
With CRs now solidly in the 10 to 13 stop range, the HDR community is trying to fix that by allowing a few stops of mixed reflections above maximum diffuse white, but making a bit of a mess of it imho with current 'standards', as they pertain to photographers. Do you have an HDR os and monitor? If so have you tried LR's recently introduced HR mode? It's early days but it shows the direction in which things could be going.
I don't have perfect pitch. I can get by with fairly good relative pitch, most of the time.
But, as a consequence of listening to the start-up chime of whatever Mac I've owned since the early 90s, many, many thousands of times my brain "knows" A440.
I can "hear" it in my brain if I want, I can vocalize it and I can recognize it as accurately as my tuning programs to tune my guitar.
No doubt it's really hard to tell if an orchestra is tuned to A440 or A444. But I'll bet you could tell if it's tuned to A432!
Now, if I could just know white when I see it . . .
I find that it's quite easy to perceive a sheet of paper under tungsten (or other "warm" lights) as either white or yellowish as I wish. And what was Monet doing when he painted his haystacks ?
I used to teach a three-session workshop for on- and off-camera flash techniques that was targeted for beginners. I eventually developed a patter where I'd tell my students that I was about to lie to them for simplicity and then dive deeper later. In particular, I'd first tell the lie that "shutter speed doesn't matter for flash exposure" by comparing a set of images shot with no ambient at different shutter speeds which all wound up looking the same. Later, I'd talk about sync speeds and tell the lie that "you can never shoot with flash above your sync speed". Finally, I'd get into some of the nuances of using high speed/focal plane sync systems to shoot at any shutter speed.
If I'd started day one talking about HSS/FP sync, I'd have lost nearly all of my students. By the time they had absorbed the first two classes, most of them understood what was happening with HSS/FP.
In other words, I think this is an excellent suggestion when approaching topics like this.
White is just a color of the things about we know that they are white.
Paper is white under all kind of light - because we know that. Snow is also white, and gadzillion other things too.
Then we can ask, how can same thing be white under different lighting - and we can explain that our eyes and brain adjust (process) 'recorded image' to make whites white (and all other colors will be adjusted in same process - we know color of many things).
Then we should emphasise that camera has no knowledge about colors or white things or anything at all.
To make recorded image similar to what our eyes see, we need to process recorded image in similar ways that our brain processes - this is calling white balancing.
Then someone asks, how can such stupid cameras make decisions about colors - they can use AWB with satisfying results, after all.
And we can explain that camera makers are not that stupid than cameras itself - they have analysed millions different images and scenes colors and written clever algorithms, which can adjust resulting colors, based by prevailing colors (and myriad of other things) in recorded image.
Then someone may ask, why can't our eyes adjust to unprocessed (wrong) colors, seen on monitor or other screens.
This a bit harder to explain, but we can say that if the only thing what we see (and have seen) is 'wrong' image on screen, then we actually can adjust to it. Usually this is not the case - lights are on, we have seen some other images and text on same screen before and so on.
There's also user adaptation. The user is not always adapted to the white point of the monitor. Looking at a 5000K white point monitor in a bright room lit with 9000K lighting, the image, including the white parts, will appear yellow.
That's very illuminating (forgive the pun). I had wondered why, many years ago, if I used a white piece of paper to WB a scene under an incandescent light that the white paper looked the way I saw it but the lampshade's color changed dramatically. Thanks.