I don't think there are any standard networks for this.
A new type of network would need to be designed for this. You could probably use something similar to the Google Object Detection network with an additional output as the distance the object is.
|