A good multithreading algorithm is hard to achieve. We are working on this topic since 2 years already with Intel as partner. An engineer there validates the changes we do in the code to see if the idea / spirit of each algorithm is going to scale well or not.
This is complicated. Many of the stages are now multithreaded, but some of them are not at all.
Moreover, we have to take care of the hyperthreading fact too. It's a core, but virtual core and doesn't have the same abilities as a real physical core.
Be sure, that on the MT topic, we are really good even if the task manager isn't showing that well ( BTW : it's the worth MT benchmark, but unfortunately, it's the only visible
So, in you case, blending, we only use real core and not the virtual core because it would have slow down the process. Here's the IO dominates and a virtual core cannot do IO ( in fact it can, be it would lock the second core sharing the same die ). So we just use real cores.