Learning with Limited Labeled Data

Ballroom 3

Short abstract

Supervised machine learning requires large labeled datasets – a prohibitive limitation in many real world applications. What if machines could learn with fewer labeled examples? This talk explores and demonstrates an algorithmic solution that relies on collaboration between humans and machines to label smartly, and discusses product possibilities.

Abstract

Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist, but has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement, but is an expensive undertaking.

Classical approaches to this problem rely on human and machine collaboration. In these approaches, engineered heuristics are used to smartly select the “best” instances of data to label, in order to reduce cost. A human steps in to provide the label. The model then learns from this smaller labeled dataset. Recent advancements have made these approaches amenable to deep learning, enabling models to be built with limited labeled data.

In this talk, we explore algorithmic approaches that drive this capability, and provide practical guidance for translating this capability into production. We provide intuition for how and why these algorithms work by demoing and describing how we built a working prototype.

Takeaways

* Classical active learning strategies (engineered heuristics) to choose the “best” data to label

* Active learning algorithms for deep learning

* An under-the-hood understanding of active learning

* When to use active learning, and what to look out for