Objects that Sound: DeepMind’s Research Show How to Combine Vision and Audio in a Single Model

Author(s): Jesus Rodriguez

Called AVE-Net, the new architecture remains a major breakthrough in multi-modal learning.

Published via Towards AI