Qosmo Co., Ltd. (Headquarters: Meguro-ku, Tokyo, President: Nao Tokui, hereinafter “Cosmo”), which is working to develop creativity through the use of AI, uses its own algorithm that applies multimodal deep learning technology to input
The latest version of “Imaginary Soundscape”, a web service that allows you to find appropriate sound clips for your images, has been released free of charge in both Japanese and English. In addition, we have started licensing the “Img2Sound (image to sound)” engine, which is the core technology of this service.
・Imaginary Soundscape Website: https://www.imaginarysoundscape.net/
What is Imaginary Soundscape
People sometimes imagine sounds that they would hear if they were there, such as the sound of ripples in a photograph of a seaside, or the sound of traffic lights in a photograph of a scramble crossing in Shibuya.
This project is a web service that uses AI to externalize such actions that people unconsciously imagine.
Based on a user-selected image, AI selects the best sound from a sound library of over 60,000 sound clips. Also, in Google Street View mode, you can walk around anywhere in the world and experience the soundscape “imagined” by AI on the spot.
Imaginary Soundscape has received a lot of attention since its launch in 2017, and has been used by nearly 500,000 users from all over the world.
Google Street View mode that finds environmental sounds that match Street View photos
Features updated in the latest version
This update includes three changes: improved model accuracy, expanded speech database, and improved UI. We changed from a discriminative model-based model to a multimodal model with contrastive learning.
In addition, the library of sound data that is the target of matching has been greatly expanded. As a result, we were able to increase the sensitivity of matching to a wider variety of nuances than ever before.
In addition, the interface has been improved to make it more familiar to first-time users, and in addition to the English notation so far, a Japanese translation has been added.
Licensing of “Img2Sound (image to sound)” engine
With the release of this new version, the technology has reached a certain level of perfection. Started licensing the engine.
The Img2Sound engine consists of a pre-trained model to replace each image and sound constructed by a deep learning algorithm with a multi-dimensional abstract vector representation. In order to match images and sounds that are highly related, comparative learning is applied to these two vector spaces using a method called contrastive learning, and two different modalities (here, image and sound) are compared.
It makes it possible to quantify the similarity of sounds). This technology is a highly applicable technology that can associate various types of media such as text and sound, video and sound. We have been supporting the introduction of modal AI technology.
・Imaginary Soundscape technical explanation page (only describes the old version technology, will be updated soon): https://qosmo.jp/projects/imaginarysoundscape/
Past exhibitions and award history
Dec. 2017 Paper accepted for NeurIPS: Machine Learning for Creativity and Design,
an influential international conference on deep learning Feb. 2018 Exhibited “Imaginary Soundwalk” at Media Ambition Tokyo 2018 *Sound installation applying the mechanism of this web service Published in “Experiments with Google — AI Experiments” in October 2018 Received “Favorite Website Award (FWA) Site of the day” in December 2018