Google's ‘Show and Tell' AI can tell you 正確に/まさに what's in a photo (almost): System 生成するs captions with nearly 94% 正確

  • 最新の 見解/翻訳/版 of the system is faster to train and far more 正確な
  • Picture captioning AI can now 生成する descriptions with 94% 正確?
  • 会社/堅い has now 解放(する)d the open-source code to let?developers?take part

人工的な 知能 systems have recently begun to try their 手渡す at 令状ing picture captions, often producing hilarious, and even 不快な/攻撃, 失敗s.

But, Google’s Show and Tell algorithm has almost perfected the (手先の)技術.

によれば the 会社/堅い, the AI can now 述べる images with nearly 94 パーセント 正確 and may even ‘understand’ the 状況 and deeper meaning of a scene.

According to the firm, Google's AI can now describe images with nearly 94 percent accuracy and may even ?understand? the context and deeper meaning of a scene. The AI was first trained in 2014, and has steadily improved in the time since

によれば the 会社/堅い, Google's AI can now 述べる images with nearly 94 パーセント 正確 and may even ‘understand’ the 状況 and deeper meaning of a scene. The AI was first trained in 2014, and has 刻々と 改善するd in the time since

Google has 解放(する)d the open-source code for its image captioning system, 許すing developers to take part, the 会社/堅い 明らかにする/漏らすd on its 研究 blog.

The AI was first trained in 2014, and has 刻々と 改善するd in the time since.

Now, the 研究員s say it is faster to train, and produces more 詳細(に述べる)d, 正確な descriptions.

The most 最近の 見解/翻訳/版 of the system uses the Inception V3 image 分類 model, and を受けるs a 罰金-tuning 段階 in which its 見通し and language 構成要素s are trained on human 生成するd captions.

The most recent version of the system uses the Inception V3 image classification model, and undergoes a fine-tuning phase in which its vision and language components ar
e trained on human generated captions

The most 最近の 見解/翻訳/版 of the system uses the Inception V3 image 分類 model, and を受けるs a 罰金-tuning 段階 in which its 見通し and language 構成要素s are trained on human 生成するd captions

HOW IT WORKS ?

The AI can describe exactly what's in a scene

The AI can 述べる 正確に/まさに what's in a scene

The system uses the Incepti on V3 image 分類 model as the basis for the image encoder, 許すing for 93.9 パーセント 分類 正確.

These encodings help the system to 認める さまざまな 反対するs in an image.

Then the image model is 罰金-tuned, 許すing the system to 述べる the 反対するs rather than 簡単に 分類するing them.

So, it can identify the colours in an image, and 決定する how 反対するs in the image relate to each other.

In this 段階, the system’s 見通し and language 構成要素s are jontly trained on human 生成するd captions.

宣伝

Examples of its 能力s show the AI can 述べる 正確に/まさに what is in a scene, 含むing ‘A person on a beach 飛行機で行くing a 道具,’ and ‘a blue and yellow train traveling 負かす/撃墜する train 跡をつけるs.’

As the system learns on a training 始める,決める of human captions, it いつかs will 再使用する these captions for a 類似の scene.

This, the 研究員s say, may 刺激(する) some questions on its true 能力s ? but while it does ‘regurgitate’ captions when applicable, this is not always the 事例/患者.

‘So does it really understand the 反対するs and their interactions in each image? Or does it always regurgitate descriptions from the training data?,' the 研究員s wrote.?

As the system learns on a training set of human captions, it sometimes will reuse these captions for a similar scene. This can be seen in the examples above?

As the system learns on a training 始める,決める of human captions, it いつかs will 再使用する these captions for a 類似の scene. This can be seen in the exam ples above?

'Excitingly, our model does indeed develop the ability to 生成する 正確な new captions when 現在のd with 完全に new scenes, 示すing a deeper understanding of the 反対するs and 状況 in the images.'

An example 株d in the blog 地位,任命する shows how the 構成要素s of separate images come together to 生成する new captions.

Three separate images of dogs in さまざまな 状況/情勢s can thus lead to the 正確な description of a photo later on: ‘A dog is sitting on the beach next to a dog.’?

‘Moreover,’ the 研究員s explain, ‘it learns how to 表明する that knowledge in natural-sounding English phrases にもかかわらず receiving no 付加 language training other than reading the human captions.’

MICROSOFT'S CAPTION BOT GETS IT HILARIOUSLY WRONG?

Microsoft's CaptionBot, which 分析するs pictures ーするために 明確に表す captions, has been 位置/汚点/見つけ出す on with some results, but horridly wrong for others ? it thought the First Lady Michelle Obama was a 独房 phone.

When it was 解放(する)d to the public earlier this year, the program seemed to be 正確な with almost all of the images it received.

The bot thought the First Lady Michelle Obama was a cell phone
And, it thought 'the dress' was actually a cat wearing a tie

Microsoft's CaptionBot, which 分析するs pictures ーするために 明確に表す captions, has been 位置/汚点/見つけ出す on with some results, but horridly wrong for others

But recently, it mistook an 肘 as a woman 小衝突ing her teeth and a の近くに up of a human 注目する,もくろむ as a の近くに up of a doughnut 近づく a cup.

'It's 早期に days for image captioning,' a Microsoft spokesperson told Dailymail in April.

'Like any 人工的な 知能 system, we use feedback fr om 使用者s of CaptionBot to 改善する our results and make it more 正確な.'?

宣伝

?

The comments below have not been 穏健なd.

The 見解(をとる)s 表明するd in the contents above are those of our 使用者s and do not やむを得ず 反映する the 見解(をとる)s of MailOnline.

We are no longer 受託するing comments on this article.