On a lot of machine learning tasks, the progress of the model (in terms of accuracy) is relatively rapid in the beginning. However, as it reaches the human-level performance it begins to slow down and later on plateau almost completely.
Over time, as the models keep getting bigger and bigger and they are trained on more and more data, the performance approaches but never surpasses a theoretical limit known as Bayesian Optimal Error (B.O.E.). In other words, you can think of Bayesian Optimal Error as the best optimal error than can be achieved. However, there’s no function that can go beyond the accuracy of B.O.E., the sad part about it is that it cannot be calculated. Similar to infinity, we know it’s properties and we calculate how functions perform on [both ends] of the infinity; it is only a theoretical concept to demonstrate a limit.
Oftentimes this slowing down after the model plateaus, and surpasses human-level performance. According to Andrew Ng, there are two general reasons why this happens:
- Humans are generally good at some tasks and their performance is not far from Bayesian Error. For example, looking in an image and telling whether there’s a cat in the image or not. Or hearing a snippet of audio and transcribing it. Humans have been doing this for hundred of thousands of years for survival reasons and that is the reason why they excel at such tasks.
- If the model is worse than human-level performance, there are tools that can be utilized to improve this performance. However, after that point, it gets hard for the model to learn because we no longer can show things that the machine learning system doesn’t already know. For the case of the system performing sub-human-level, we can get labeled data from humans. But once it gets better than us, then we’re kind of stuck.
To better understand this limit, we have to know what is human-level performance
What is human-level performance?
What does it mean for a model to reach human-level performance? This can be best illustrated with a practical example in the field of radiology.
Let’s suppose that these are the performance these groups achieve:
(a) Untrained human …………… 15% error
(b) Doctor ……………………….. 6% error
(c) Experienced doctor …………. 3% error
(d) Team of experienced doctor .. 0.6% error
If you could take a guess, which choice would you say illustrates the human-level performance?
From the above examples, we can say that the lowest error is option (d). From this, we can conclude that Bayes Optimal Error is equal to 0.6% or lower.
Depending on the use-case, for production level implementations it can be sufficient for the system to work as well as an experienced doctor (c) — particularly if the system is non-critical.
Surpassing human-level performance?
Can machine learning systems perform at or better than human? Depends.
In the recent years intelligent systems have evolved tremendously - mainly due to gigantic pools of data out there and high increase in processing power - and they surpass us in many areas, such as: online advertising, product recognition, logistics, loan approvals, etc. All of these areas have lots of data and are objective (meaning not involving natural perception). Financial loans will still be there regardless of how we feel about it.
On the other hand, on the task that involve perception, humans still have the upper hand. Why you might ask; well, humans have been doing these natural perceptions tasks — such as: vision and speech recognition, eye-body coordination, and intuition — for hundred of thousands of year because they have been crucial for our survival as a species.
Hope this article helped you get a clear general idea on what is human-level performance, how to calculate it, and how is it different from Bayesian Optimal Error.
I’d love to hear your ideas on what you’d like to read next — let me know down below in the comment section!
You can always connect with me via LinkedIn.