Mostrando entradas con la etiqueta Tensor Flow. Mostrar todas las entradas
Mostrando entradas con la etiqueta Tensor Flow. Mostrar todas las entradas

viernes, 23 de septiembre de 2016

Show and Tell: image captioning open sourced in TensorFlow

 In 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place. Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatically captioned by our system.
So what’s new?  Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task.The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model.  For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.  In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular,  after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
 
Our model generates a completely new caption using concepts learned from similar scenes in the training set
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page here. While our system uses the Inception V3 image classification model, you could even try training our system with the recently released Inception-ResNet-v2 model to see if it can do even better!
ORIGINAL: Google Blog
by Chris Shallue, Software Engineer, Google Brain Team September 22, 2016

jueves, 19 de mayo de 2016

Google Built Its Very Own Chips to Power Its AI Bots

GOOGLE
GOOGLE HAS DESIGNED its own computer chip for driving deep neural networks, an AI technology that is reinventing the way Internet services operate.

This morning, at Google I/O, the centerpiece of the company’s year, CEO Sundar Pichai said that Google has designed an ASIC, or application-specific integrated circuit, that’s specific to deep neural nets. These are networks of hardware and software that can learn specific tasks by analyzing vast amounts of data. Google uses neural nets to identify objects and faces in photos, recognize the commands you speak into Android phones, or translate text from one language to another. This technology has even begin to transform the Google search engine.

Big Brains
Google’s called its chip the Tensor Processing Unit, or TPU, because it underpins TensorFlow, the software engine that drives its deep learning services.

This past fall, Google released TensorFlow under an open-source license, which means anyone outside the company can use and even modify this software engine. It does not appear that Google will share the designs for the TPU, but outsider can make use of Google’s own machine learning hardware and software via various Google cloud services.

Google says it has been running TPUs for about a year, and that they were developed not long before that.Google is just one of so many companies adding deep learning to a wide range of Internet services, including everyone from Facebook and Microsoft to Twitter. Typically, these Internet giants drive their neural nets with graphics processing units, or GPUs, from chip makers like Nvidia. But some, including Microsoft, are also exploring the use of field programmable gate arrays, or FPGAs, chips that can be programmed to specific tasks.
GOOGLE
According to Google, on the massive hardware racks inside the data centers that power its online services, a TPU board fits into the same slot as a hard drive, and it provides an order of magnitude better-optimized performance per watt for machine learning than other hardware solutions.

TPU is tailored to machine learning applications, allowing the chip to be more tolerant of reduced computational precision, which means it requires fewer transistors per operation,” the company says in a blog post. “Because of this, we can squeeze more operations per second into the silicon, use more sophisticated and powerful machine learning models and apply these models more quickly, so users get more intelligent results more rapidly.

This means, among other things, that Google is not using chips from companies like Nvidia—or using fewer chips from these companies. It also indicates that Google is more than willing to build its own chips, which bad news from any chipmaker, most notably the world’s largest: Intel. Intel processor power a vast major of the computer servers inside Google, but the worry, for Intel, is that the Internet giant will one day design its own central processing units as well.

Google says it has been running TPUs for about a year, and that they were developed not long before that. After testing its first silicon, the company says, it had it running live applications inside its data centers within 22 days.

ORIGINAL: Wired
By Cade Metz
05.18.2016 

miércoles, 9 de diciembre de 2015

Here’s What Developers Are Doing with Google’s AI Brain

Researchers outside Google are testing the software that the company uses to add artificial intelligence to many of its products.

WHY IT MATTERS
Tech companies are racing to set the standard for machine learning, and to attract technical talent.
Jeff Dean speaks at a Google event in 2007. Credit: Photo by Niall Kennedy / CC BY-NC 2.0
An artificial intelligence engine that Google uses in many of its products, and that it made freely available last month, is now being used by others to perform some neat tricks, including 
  • translating English into Chinese, 
  • reading handwritten text, and 
  • even generating original artwork.
The AI software, called Tensor Flow, provides a straightforward way for users to train computers to perform tasks by feeding them large amounts of data. The software incorporates various methods for efficiently building and training simulated “deep learning” neural networks across different computer hardware.

Deep learning is an extremely effective technique for training computers to recognize patterns in images or audio, enabling machines to perform with human-like competence useful tasks such as recognizing faces or objects in images. Recently, deep learning also has shown significant promise for parsing natural language, by enabling machines to respond to spoken or written queries in meaningful ways.

Speaking at the Neural Information Processing Society (NIPS) conference in Montreal this week, Jeff Dean, the computer scientist at Google who leads the Tensor Flow effort, said that the software is being used for a growing number of experimental projects outside the company.

These include software that generates captions for images and code that translates the documentation for Tensor Flow into Chinese. Another project uses Tensor Flow to generate artificial artwork. “It’s still pretty early,” Dean said after the talk. “People are trying to understand what it’s best at.

Tensor Flow grew out of a project at Google, called Google Brain, aimed at applying various kinds of neural network machine learning to products and services across the company. The reach of Google Brain has grown dramatically in recent years. Dean said that the number of projects at Google that involve Google Brain has grown from a handful in early 2014 to more than 600 today.

Most recently, the Google Brain helped develop Smart Reply, a system that automatically recommends a quick response to messages in Gmail after it scans the text of an incoming message. The neural network technique used to develop Smart Reply was presented by Google researchers at the NIPS conference last year.

Dean expects deep learning and machine learning to have a similar impact on many other companies. “There is a vast array of ways in which machine learning is influencing lots of different products and industries,” he said. For example, the technique is being tested in many industries that try to make predictions from large amounts of data, ranging from retail to insurance.

Google was able to give away the code for Tensor Flow because the data it owns is a far more valuable asset for building a powerful AI engine. The company hopes that the open-source code will help it establish itself as a leader in machine learning and foster relationships with collaborators and future employees. Tensor Flow “gives us a common language to speak, in some sense,” Dean said. “We get benefits from having people we hire who have been using Tensor Flow. It’s not like it’s completely altruistic.

A neural network consists of layers of virtual neurons that fire in a cascade in response to input. A network “learns” as the sensitivity of these neurons is tuned to match particular input and output, and having many layers makes it possible to recognize more abstract features, such as a face in a photograph.

Tensor Flow is now one of several open-source deep learning software libraries, and its performance currently lags behind some other libraries for certain tasks. However, it is designed to be easy to use, and it can easily be ported between different hardware. And Dean says his team is hard at work trying to improve its performance.

In the race to dominate machine learning and attract the best talent, however, other companies may release competing AI engines of their own.

December 8, 2015