Self-supervised learning for robotics

A crash course from Robotics: Science and Systems Conference 2020.

Nathan Lambert
6 min readJul 14, 2020

Self-supervised learning is an exciting research direction that aims to learn representations from the data itself without explicit and potentially even manual supervision. One of the major benefits of self-supervised learning is the ability to scale to large amounts of unlabelled data in a lifelong learning manner and to improve performance by reducing the effect of dataset bias. Recent development in self-supervised learning has resulted in achieving comparable or better performance than fully-supervised models. However, many of these methods are developed in domain-specific communities such as robotics, computer vision or reinforcement learning. The aim of this workshop is to bring together researchers from different communities to discuss opportunities, challenges and explore new directions.

I wanted to learn something new. Self-supersived learning for robotics is all about data creation (and augmentation), reward engineering, and experimental setup so that our robots can learn on their own (and into lifelong learning). It struck me as a very young field with a breadth of applications. The link to all materials is here.

Live steam.

Dieter Fox: Overview of self-supervised learning for robotics

Autonomous data generation

  • Train pose estimation beforehand.
  • Robot generate and label data on its own (refine detection more).
  • Accurate pose initialization is important.

“Generate and label their own data”

A good introduction to self-supervision for me. I joined this talk late.

Abhinav Gupta: Learning like babies

Scaling learning with self-supervision and life long learning

  • 3 core vectors: “100x images, supervising robot data, curiosity”
  • This work is the intersection of supervised / passive learning with RL in robotics.
Linking to how we all learn already.

Existing approaches don’t scale!

  • Imagenet like approaches label 1M boxes over5years, but Facebooks generates > 600m images a day… can we label that?
  • Simulation is 1 task, tons of interactions, but in reality babies do 1000s of tasks in parallel with less structure.
Stages of how robots could learn from simplest to most complex.
  • Remove data labelling bottleneck! How do self-supervised approaches scale?
  • Hardest tasks refine the representations further (more specific embeddings, potentially better performance).
  • example: pick up objects at random locations (force sensor feedback) is a way to collect data.

Not enough diverse data so cannot transition from lab to real world

  • Chicken-egg issue: need the data to be useful, but it will not be useful until we have access to data, so…
  • Rented Airbnb’s to collect data!
Hilarious videos of training robots in various Airbnb’s.

can we formulate curiosity in an end-to-end gradient method rather than learning in a “reward actions that disagree with the environment”?

Pierre Sermanet: Using play and language to scale robot learning

Recent works

What is “Play data”

  • Play can be a substitute for RL because it gives exploration inherently (RL is designed to balance exploration and exploitation).

The great slides from this talk below (thanks Zoom).

Definitions of play.
Different types of robotics data potentially used.
How these data types are linked.

Panel 1 was highlighted by this question:

How far from real industrial application?

Answer: “crickets”

“Robots are going to have to explain themselves, so they will need to generate text”

Roberto Calandra: Few lessons learned from self-supervised learning on real robots

What is self-supervised learning?

  • Supervised learning without labelling the data: Learn embeddings, automatic labelling.
  • Benefits: large data collection is feasible, in real world it leads to better experimental design and engineering, seems obvious from how humans work.
  • Limitations: structure of the problem needs to be known and consistent, labelling mechanism needed.
  • Need to remember the front end prettiness of robotics vs hidden behind the scenes challenges in robotics, and self-supervision may mitigate this.
Marble manipulation task used self-supervision.
Can use self-supervision with model-based planning.

6 lessons from robotics & self-supervision

  1. Safety, safety, safety.
  2. Careful experiment design is very important. Measure twice, cut once is not enough always -> need iterative improvement process.
  3. Don’t underestimate engineering.
  4. Designing and monitoring diagnostics is crucial.
  5. Log everything and maintain consistency (hydra.cc).
  6. Do not code experiments as sequences of actions (finite state machines offer substantially more robustness).

Can we run robots entirely remotely?

Chelsea Finn: Data scalability for robot learning

Generalizing across tasks, objects, and environments

How the training and test distributions often look in robotics.
  • Generalize broadly, train it on broad data. We want scalable data sources to do so (like modern CV).
  • What does robot learning data look like? Match this with ML data process of training + validation sets.
Timeline on multi-robot, multi-task learning.
ML <-> robotics.

Need to get large datasets and algorithms to work with them

  • goal: accumulate and reuse datasets across labs: RoboNet.
  • Like a validation set, can use one robot to fine tune performance.

Can we link all the videos available online to robotic tasks? Need to account for dramatic domain shift, but it’s a huge dataset opportunity.

  • Trying to model video prediction in robots: Bottleneck — undercutting (model everything).
  • Instead, we consider goal-aware prediction to aim at goal-relevant content. Redistribute model capacity to good trajectories.

Pieter Abbeel: DRL — Can learning from pixels be as efficient as learning from state?

DRL from state was not good at all a few years ago, but now it’s nearly matched.

History

  • CURL: Contrastive Unsupervised Representations for Reinforcement Learning (CURL) and Reinforcement Learning with Augmented Data (RAD) papers.
  • Contrastive learning — dominant in CV. Results now show that supervised and unsupervised learning combined can do better.
  • SIMCLR is SOTA in image recognition (with self-supervision).
  • Important to have a sequence of 3 frames (movement is key in robotics).
Blue is with self-supervision on Imagenet, red is without (it’s more data efficient).

Contrastive learning (CURL)

The main objective function in DRL from images these days.
  • add to it query / key pairs (e.g. random crop), bilinear inner product with learned weight matrix, keys encoded with momentum.
  • Hard environments don’t work yet: Supervised learning cannot extract the state from the image (need good enough images).

RAD: RL + Data augmentation

  • Rotation, mirror, etc (crop, rotate are important), Matches CURL
  • CURL can be applied without reward function (multi-task fit).
Another comparison of DRL to image-based research.

Andy Zhang: Learning to See Actions (for Vision-based Manipulation)

  • How do we get our training labels?
  • Trying to label 3d orientation in 2d images in non-intuitive and hard.
A great slide “what are objects”.

A proposal for end-to-end learning for objective centric representations (no assumptions of object-ness when end-to-end)

--

--

Nathan Lambert
Nathan Lambert

Written by Nathan Lambert

Trying to think freely and create equitable & impactful automation @ UCBerkeley EECS. Subscribe directly at robotic.substack.com. More at natolambert.com

No responses yet