So you're saying that with about twice the total light, the m43-derived image was still noisier? I am not sold on the idea that you knew what you were looking at, unless there was some large-scale fixed pattern noise in the m43 that did not cancel with the pixel shifts, or the m43 50MP image was gratuitously oversharpened at the pixel level.
I recommend the DPR Studio Comparison Tool because it includes not stats, but stats in visible action that includes differences in read noise character.
Put any 20MP m43 up against any 20MP FF at 4x the ISO, and the m43 holds it own, especially the new Olympii. Put the Canon GX7-III (2.7x crop) against 20MP FF sensors at 8x the ISO, and the 1" sensor competes well even with the Nikon D5:
The granularity of ISOs, pixel sizes, and sensor sizes is too coarse for us to simulate the "same total light" in the 4 windows in the tool for all possible camera comparisons, but we do have the same ratios available in some cases, and when you compare same total light for the 4 windows, the visible noise is virtually independent of sensor size or pixel size, with the same amount of light. The pixel area ratios for the Canon R6, R5, and R7 are almost exactly 4:2:1, so if we use those numbers to scale the ISOs to receive the same total light per pixel, and throw in an older 20MP FF (6D), we get:
Based on your history, and what I could expect as a range of possibilities, I would bet that you forgot to normalize something.
I just took a look at the two cameras in the DPR studio comparison tool, and the Olympus gives much less visible read noise at high ISO with the same total light.
In a CMOS sensor the saturation is determined by the SF output swing, that in turn is determined by the supply voltage of the output circuitry. The CG is determined by floating diffusion capacitance, and will be chosen to give the design voltage swing at the highest design exposure ('base ISO') - which in the end makes it all linked - the saturation voltage in the pixel itself will be inversely proportional to the CG. Still, Don was talking about small-signal performance, as I understood it - and small pixel/high conversion gain has an 'advantage' in the voltage domain so far as that goes. I think it's the kind of 'advantage' that is purely theoretical.
What gets me is that there is a very obvious logical fallacy right there at the beginning - yet they wrote it and people seem to read it uncritically.
This bit seems particularly relevant to this thread: - at the end
Edit: Just added another comment - posting it here so I don't lose it when he rejects it.
So yes, small signal performance. But what you said is not quite right on several levels, but let's take as a given that accuracy of the measurement determines accuracy of the colour. In that case what would be important is accuracy of measurement of the charge, not the voltage - the voltage is just an intermediary - its absolute value is unimportant. Plus, accuracy of its measurement does not depend on its magnitude. And smaller pixels can provide a more accurate measurement.
Suppose we have a pixel with a saturation capacity S and read noise r. The DR (i.e. how many separate values of that pixel output can be measured) is S/r. Now we make another sensor with pixels half the linear dimension, 1/4 the area. We're going to make these new pixels simply by taking the reticles for the old pixel and scaling to 1/2 the linear dimension (This never happens in practice, where a completely new design would likely be adopted for that large a scale - but for the purposes of thinking about what are the intrinsics it's possibly a sensible simplification). The scaling means that the capacitance of the pixel is reduced by a factor of four, which in turn means 1/4 of the saturation level and four times the conversion gain, which controls the input referred read noise. Now the saturation level is S/4 and the read noise is r/4. If we combine the signal from these four pixels the saturation is correlated. so adds - the read noise is uncorrelated so adds in quadrature. So the combined saturation is 4S/4 = S, whilst the combined read noise is √(4(r/4)²) = 2*(r/4) = r/2. The DR at the resolution of the original sensor is S/(r/2) = 2S/r. So by halving the linear size of the pixel we've doubled the DR. Of course, as said, that's not in practice how sensors are designed, so we don't see the full advantage of reducing pixel size - but nonetheless the trend is there. If your thesis was correct, the highest DR cameras would tend be the low pixel count ones of a given sensor size, but results tend to show the opposite. For instance ranking DxOMark's tests by DR we find:
Of course DR is affected by quite a few things as well as pure pixel performance, such as - but we certainly are not seeing a clustering of large pixel cameras at the top.
It's a bit simplified, in that I have bundled a load of noises into 'read noise' - but it's essentially correct. DR is generally 'maximum signal/noise floor', and in this case 'noise floor' is dominated by read noise. Or maybe you're not sure about DR being how many separate values can be measured - that's essentially what DR tells you. The size of the noise floor tells you how big a range a 'distinct' value takes and the maximum divided by that tells you how many distinct values there are. It's a bit more complex in photography due to shot noise, which means that most of the 'distinct values' that would be available in a classic DR become indistinct.