I have a few speaking events coming up:
“How to have fun in AI research?”, hosted by WomenInAI and South Park Commons, February 20, 2020.
If you find yourself in an explosively growing field such as machine learning & AI, at this moment in 2020, and you are not exactly one of those “cool guys” at the top of the field that everyone knows about; and if you are on Twitter—you are probably at times overwhelmed and unhappy, and almost all the time, stressed, wondering how you don’t have six papers at NeurIPS or publish an arXiv every month.
This is a technical talk, but also one that’s emotional, heart-to-heart, and perhaps even cheesy.
We will go over a few technical research works, both from our team at Uber AI and the machine learning community at large, to uncover intriguing behaviors in neural networks, understand training, rethink model complexity, and just for fun, stress-test generative language models.
Through this review we will together dissect what elements make up a complete research cycle in AI, and how there are many ways to enjoy the process (even when it is difficult); and eventually, how to use that little bit of fun to combat the large ocean of stress, and why that matters to each of us.
“Controlling Text Generation with Plug and Play Language Models”, Auto.AI, San Francisco, February 24, 2020.
Deep neural networks have recently made a bit of a splash, enabling machines to learn to solve problems that had previously been easy for humans but difficult for computers, like playing Atari games, identifying dog breeds in photos, and generating realistic images and coherent texts. However, as models get more powerful, our understandings of them lag behind. For example, we don’t really know whether generative language models like GPT-2 really know what they are talking about. They have shown unparalleled generation capabilities, however, controlling attributes of the generated language (e.g. switching topic or sentiment) remains difficult, without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining.
If we can’t control them in a simple way, we are certainly far from understanding them. It is surprising that we don’t understand the intelligence we build, trained with data we produced. But also maybe not surprising. We also don’t understand babies. At least with models we can explore their latent space, visualize their representations, and stress test it as many times as we want.
In this work, we use a simple method – the Plug and Play Language Model (PPLM) – for controllable language generation, which combines a pretrained language model (LM), like GPT-2, with one or more simple attribute classifiers that guide text generation, without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM’s hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper. And more importantly, the controlling process can be seen as stress-testing a model, and can help us understand its abilities and limits.
Will share slides once they are ready…if they are ever ready…