Table of Contents
- Machine Learning Engineer Sayak Paul introduces key trends in computer vision
- Computer vision goals
- Example of use
- computer vision trends
- Trend I: Resource Efficiency Model
- why it’s trending
- building process
- Action plan up to model implementation
- Trend II: Generative Deep Learning for Creative Applications
- why it’s trending
- application
- Action plan up to model implementation
- Trend III: Self-Supervised Learning
- Limitations of supervised learning
- Training on unlabeled data
- Disadvantages of self-supervised learning
- References
- Trend IV: Use of Transformers and Self-Attention
- why it’s trending
- Pros and Cons of Leveraging Transformer
- Exploring the Vision Transformer
- Trend V: A Robust Vision Model
- Problems faced by vision models
- to be robust
Machine Learning Engineer Sayak Paul introduces key trends in computer vision
Computer vision is a fascinating area of artificial intelligence with enormous real-world value. Billion-dollar startups are popping up in the space, with Forbes predicting the market will reach $49 billion by 2022.
Computer vision goals
A major goal of computer vision is to give computers the ability to understand the world through vision and make decisions based on that understanding.
The technology can be applied to automate or augment human vision, resulting in a myriad of use cases.
If AI enables computers to think, computer vision enables computers to see, observe, and understand. – IBM
Example of use
Use cases for computer vision range from transportation to retail.
A typical use case in transportation can be found at Tesla. The company builds electric, self-driving cars, relying solely on cameras driven by computer vision models.
Computer vision is also revolutionizing the retail industry and making it even more convenient, with things like the Amazon Go program using smart sensors and computer vision systems to enable cashier-less shopping.
Computer vision clearly has a lot of potential in terms of contributing to practical applications. As a practitioner or deep learning enthusiast, it’s important to keep an eye on the latest advances in the field and stay on top of the latest trends.
computer vision trends
In this article, I share the thoughts of Sayak Paul , Machine Learning Engineer at Carted and a recent Bitgrit talker . You can also find him on LinkedIn and Twitter .
This article is not an exhaustive list of the talks, but rather a summary or gist of the talks. You can view the slides from the lecture here. The slides have useful links related to similar topics. The talk is also available on YouTube where you can get more information.
The purpose of this article, like his talk, is to help readers by:
- Discover more exciting things to work on.
- Inspire ideas for your next project.
- Catch up on what’s happening on the ground.
Before we get to the trends, 💵a new data science competition with a prize pool of $3,000 has been announced that some readers may not know about yet.
Trend I: Resource Efficiency Model
why it’s trending
- The latest models can be very difficult to run offline using small devices with microprocessors, such as mobile phones and Raspberry Pi.
- Heavy models tend to have high latency (where latency is the time it takes for one model to perform the forward pass), which has a significant impact on infrastructure costs.
- If cloud-based model hosting is not an option (due to cost, network connectivity, privacy concerns, etc.), what models are available?
building process
1. Sparse training
- Sparse training is the introduction of zeros into the matrices used to train the neural network. This is possible because not all dimensions interact with other dimensions, in other words (certain dimensions in particular) are important.
- Performance may be degraded, but the result is a much smaller number of multiplications and a faster network training time.
- A very relevant technique is pruning, which discards network parameters that fall below a certain threshold (there are other criteria for discarding).
2. Inference after training
- Use quantization in deep learning to reduce model accuracy (from FP16 to INT8) and reduce size.
- Quantization-Aware Training (QAT) can compensate for the loss of information due to reduced precision.
- Pruning + quantization works best for many use cases.
3. Knowledge Distillation
- Train a high-performing teacher model, extract its “knowledge”, and train another small student model to match the labels obtained from the teacher.
Action plan up to model implementation
- Cultivate a larger, higher performing teacher model.
- Do knowledge distillation and use QAT if possible.
- Pruning and quantizing knowledge distilled models.
- Implement
・・・
Trend II: Generative Deep Learning for Creative Applications
why it’s trending
- Generative deep learning has really advanced.
- Thisxdoesnotexist.com has an example of what can be achieved with generative deep learning.
application
1. Image super-resolution
- The image (resolution) can be upscaled according to applications such as surveillance cameras.
2. Domain transfer
- Transfer an image to another domain
- Example: cartoonize or animate an image of a person
3. Extrapolation
- Create a new context for the masked region in the image.
- It is used in domains such as image editing to simulate functionality found in the Photoshop app.
4. Implicit Neural Representation and CLIP
- Ability to generate images from captions (e.g. from the text ‘People riding bicycles in New York City’ generate an image of the content)
- GitHub repository

Action plan up to model implementation
- Research and implement such products. This step may be omitted.
- Develop end-to-end projects.
- If we improve the elements used in generative deep learning, we may discover something new.
Trend III: Self-Supervised Learning
Self-supervised learning does not use ground truth labels at all, but instead uses pretext tasks. Then use a large unlabeled dataset to train the model on the dataset.
How does self-supervised learning compare to supervised learning?
Limitations of supervised learning
- Requires huge amount of labeled data to improve performance.
- Labeled data is expensive to prepare and can be biased.
- For such large data, the training time becomes very long.
Training on unlabeled data
- We want the model to be invariant for different views of the same image.
- Intuitively, the model learns what makes two images (eg, a cat and a mountain) visually different.
- It’s much cheaper to have an unlabeled dataset!
- In the field of computer vision, SEER (self-supervised models) outperform supervised learning models in object detection and semantic segmentation.
Disadvantages of self-supervised learning
- Self-supervised learning requires a very large data domain to perform well in real-world tasks like image classification.
- In contrast, self-supervised learning is also computationally intensive.
References
- Self-Supervised Learning: The Dark Matter of Intelligence
- Understanding self-supervised learning using controlled datasets with known structure
Trend IV: Use of Transformers and Self-Attention
why it’s trending
- Attention learns that the network aligns important contexts in the data by quantifying the interactions of paired entities.
- The idea of Attetion exists in various forms in computer vision. GC blocks , SE networks , etc. However, the results were marginal.
- The Self-Attention block is the foundation of Transformer .
Pros and Cons of Leveraging Transformer
Strong Points
- Its low a priori recursion makes it a general computational primitive for a variety of learning tasks.
- Performance equivalent to CNN can be obtained by streamlining parameters.
Cons
- Since Transformer does not have clear a priori inductiveness like CNN, composition of large-scale data is most important for pre-training.
Another trend is that when self-attention is combined with CNN, it establishes a strong baseline (BoTNet).
Exploring the Vision Transformer
- Facebook Research/deit
- Google Research/vision transformer
- Jeonworld/Vit-pytorch
- Image classification using Vision Transformer (Keras)
・・・
Trend V: A Robust Vision Model
Vision models are subject to many vulnerabilities that affect their performance.
Problems faced by vision models
1. Perturbation
- Deep models are also vulnerable to small changes in input data.
- Imagine if pedestrians were predicted to be on an empty road (due to perturbation-induced misperceptions)!
2. Corruption
- Deep models are easily locked into high frequency regions, making them vulnerable to common corruptions such as blur, contrast, and zoom.
3. Out of Distribution Data
There are two types of out-of-distribution data:
- Domain shifts but labels stay the same – We want the model to perform consistently depending on what it learns.
- Exceptional Datapoints – When faced with anomalous datapoints, the model is desired to make low-confidence predictions (according to the exceptional data).
to be robust
There are many techniques that address specific issues such as these for building robust vision models.
1. Perturbation
- Adversarial Training: Akin to Byzantine Fault Tolerance, basically preparing the system to handle itself when faced with the absolute worst conditions.
- paper
Byzantine Fault Tolerance (BFT ) is an algorithm that tolerates failures
in distributed computing . A typical practical example of BFT is blockchain. The cited paper, ” Adversarial Examples Improve Image Recognition “, discusses how adding adversarial data to training data can improve the performance of image recognition models. This paper is said to be similar to BFT because BFT allows adversarial phenomena of faults.
2. Corruption
- Consistent Regularization – You want your model to be consistent with noisy inputs.
- Examples implementing consistent regularization: RandAugment , Noisy Student Training , FixMatch
(*Translation Note 15) Regularization is a technique that limits the possible range of parameters to prevent overfitting . The example model leverages regularization and noise addition to improve model performance, as follows:
- RandAugment: Adjust regularization strength according to model and dataset size .
- Noisy Student Training: Improve the generalization of the student model by adding noise during the knowledge distillation run .
- FixMatch: Leverages consistency regularization and pseudo-labeling to perform semi-supervised learning.
3. Out-of-distribution data
- Find exceptional data points instantly.
- Thesis (*Translation Note 16)
I’d like to quote a clever quote from George Box on the principles of robust models.
“All models are wrong, but some models that we know to be wrong are useful.” – Balaji Lakshminarayanan (NeurIPS 2020)
・・・
This concludes this article. Thank you for reading! I hope you read this article and learned something new. If you like articles like this, be sure to follow the Bitgrit Data Science Publication .