Open Source Python Computer Vision Libraries - Page 2

Python Computer Vision Libraries

View 194 business solutions

Browse free open source Python Computer Vision Libraries and projects below. Use the toggles on the left to filter open source Python Computer Vision Libraries by OS, license, language, programming language, and project status.

  • Empower Your Workforce and Digitize Your Shop Floor Icon
    Empower Your Workforce and Digitize Your Shop Floor

    Benefits to Manufacturers

    Easily connect to most tools and equipment on the shop floor, enabling efficient data collection and boosting productivity with vital insights. Turn information into action to generate new ideas and better processes.
    Learn More
  • Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes. Icon
    Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.

    Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

    Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.
    Learn More
  • 1

    PyVision Computer Vision Toolkit

    A Python computer vision library

    PyVision is a object-oriented Computer Vision Toolkit for researchers that contains vision and machine learning algorithms and algorithm analysis and easily interfaces with scipy/numpy, PIL, opencv and other computer and machine learning libraries.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Albumentations

    Albumentations

    Fast image augmentation library and an easy-to-use wrapper

    Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks. Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection. Albumentations supports different computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation. Albumentations works well with data from different domains: photos, medical images, satellite imagery, manufacturing and industrial applications, Generative Adversarial Networks. Albumentations can work with various deep learning frameworks such as PyTorch and Keras.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ChainerCV

    ChainerCV

    ChainerCV: a Library for Deep Learning in Computer Vision

    ChainerCV is a collection of tools to train and run neural networks for computer vision tasks using Chainer. In ChainerCV, we define the object detection task as a problem of, given an image, bounding box-based localization and categorization of objects. Bounding boxes in an image are represented as a two-dimensional array of shape (R,4), where R is the number of bounding boxes and the second axis corresponds to the coordinates of bounding boxes. ChainerCV supports dataset loaders, which can be used to easily index examples with list-like interfaces. Dataset classes whose names end with BboxDataset contain annotations of where objects locate in an image and which categories they are assigned to. These datasets can be indexed to return a tuple of an image, bounding boxes and labels. ChainerCV provides several network implementations that carry out object detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CoTracker

    CoTracker

    CoTracker is a model for tracking any point (pixel) on a video

    CoTracker is a learning-based point tracking system that jointly follows many user-specified points across a video, rather than tracking each point independently. By reasoning about all tracks together, it can maintain temporal consistency, handle mutual occlusions, and reduce identity swaps when trajectories cross. The model takes sparse point queries on one frame and predicts their sub-pixel locations and a visibility score for every subsequent frame, producing long, coherent trajectories. Its transformer-style architecture aggregates information both along time and across points, allowing it to recover tracks even after brief disappearances. The repository ships with inference scripts, pretrained weights, and simple interfaces to seed points, run tracking, and export trajectories for downstream tasks. Typical uses include correspondence building, motion analysis, dynamic SLAM priors, video editing masks, and evaluation of geometric consistency in real scenes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Project Planning and Management Software | Planview Icon
    Project Planning and Management Software | Planview

    For Enterprise PMOs

    Planview® ProjectAdvantage (formerly Sciforma) is an enterprise-centric project and portfolio management (PPM) software designed to enable change, drive innovation, and lead in a company's digital transformation. With ProjectAdvantage, teams can strategically track and monitor project data in order to make relevant decisions. It offers multiple features focused on strategic management, functional management, and execution management. A highly scalable and cost-effective solution, ProjectAdvantage is available in various deployment models.
    Learn More
  • 5
    Colossal-AI

    Colossal-AI

    Making large AI models cheaper, faster and more accessible

    The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DETR

    DETR

    End-to-end object detection with transformers

    PyTorch training code and pretrained models for DETR (DEtection TRansformer). We replace the full complex hand-crafted object detection pipeline with a Transformer, and match Faster R-CNN with a ResNet-50, obtaining 42 AP on COCO using half the computation power (FLOPs) and the same number of parameters. Inference in 50 lines of PyTorch. What it is. Unlike traditional computer vision techniques, DETR approaches object detection as a direct set prediction problem. It consists of a set-based global loss, which forces unique predictions via bipartite matching, and a Transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. Due to this parallel nature, DETR is very fast and efficient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Deep Learning Drizzle

    Deep Learning Drizzle

    Drench yourself in Deep Learning, Reinforcement Learning

    Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures! Optimization courses which form the foundation for ML, DL, RL. Computer Vision courses which are DL & ML heavy. Speech recognition courses which are DL heavy. Structured Courses on Geometric, Graph Neural Networks. Section on Autonomous Vehicles. Section on Computer Graphics with ML/DL focus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Detectron

    Detectron

    FAIR's research platform for object detection research

    Detectron is an object detection and instance segmentation research framework that popularized many modern detection models in a single, reproducible codebase. Built on Caffe2 with custom CUDA/C++ operators, it provided reference implementations for models like Faster R-CNN, Mask R-CNN, RetinaNet, and Feature Pyramid Networks. The framework emphasized a clean configuration system, strong baselines, and a “model zoo” so researchers could compare results under consistent settings. It includes training and evaluation pipelines that handle multi-GPU setups, standard datasets, and common augmentations, which helped standardize experimental practice in detection research. Visualization utilities and diagnostic scripts make it straightforward to inspect predictions, proposals, and losses while training. Although the project has since been superseded by Detectron2, the original Detectron remains a historically important, reproducible reference that still informs many productions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Face Mask Detection

    Face Mask Detection

    Face Mask Detection system based on computer vision and deep learning

    Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras. Face Mask Detection System built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect face masks in static images as well as in real-time video streams. Amid the ongoing COVID-19 pandemic, there are no efficient face mask detection applications which are now in high demand for transportation means, densely populated areas, residential districts, large-scale manufacturers and other enterprises to ensure safety. The absence of large datasets of ‘with_mask’ images has made this task cumbersome and challenging. Our face mask detector doesn't use any morphed masked images dataset and the model is accurate. Owing to the use of MobileNetV2 architecture, it is computationally efficient, thus making it easier to deploy the model to embedded systems (Raspberry Pi, Google Coral, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of Google's scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • 10
    Gluon CV Toolkit

    Gluon CV Toolkit

    Gluon CV Toolkit

    GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It aims to help engineers, researchers, and students quickly prototype products, validate new ideas and learn computer vision. It features training scripts that reproduce SOTA results reported in latest papers, a large set of pre-trained models, carefully designed APIs and easy-to-understand implementations and community support. From fundamental image classification, object detection, semantic segmentation and pose estimation, to instance segmentation and video action recognition. The model zoo is the one-stop shopping center for many models you are expecting. GluonCV embraces a flexible development pattern while is super easy to optimize and deploy without retaining a heavyweight deep learning framework.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MMF

    MMF

    A modular framework for vision & language multimodal research

    MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research. MMF is designed from ground up to let you focus on what matters, your model, by providing boilerplate code for distributed training, common datasets and state-of-the-art pre-trained baselines out-of-the-box. MMF is built on top of PyTorch that brings all of its power in your hands. MMF is not strongly opinionated. So you can use all of your PyTorch knowledge here. MMF is created to be easily extensible and composable. Through our modular design, you can use specific components from MMF that you care about. Our configuration system allows MMF to easily adapt to your needs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation across base and target domains to measure how well the model retains its general knowledge while specializing as needed. It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Monk Computer Vision

    Monk Computer Vision

    A low code unified framework for computer vision and deep learning

    Monk is an open source low code programming environment to reduce the cognitive load faced by entry level programmers while catering to the needs of Expert Deep Learning engineers. There are three libraries in this opensource set. - Monk Classiciation- https://monkai.org. A Unified wrapper over major deep learning frameworks. Our core focus area is at the intersection of Computer Vision and Deep Learning algorithms. - Monk Object Detection - https://github.com/Tessellate-Imaging/Monk_Object_Detection. Monk object detection is our take on assembling state of the art object detection, image segmentation, pose estimation algorithms at one place, making them low code and easily configurable on any machine. - Monk GUI - https://github.com/Tessellate-Imaging/Monk_Gui. An interface over these low code tools for non coders.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    ProximityForest

    Efficient Approximate Nearest Neighbors for General Metric Spaces

    A proximity forest is a data structure that allows for efficient computation of approximate nearest neighbors of arbitrary data elements in a metric space. See: O'Hara and Draper, "Are You Using the Right Approximate Nearest Neighbor Algorithm?", WACV 2013 (best student paper award). One application of a ProximityForest is given in the following CVPR publication: Stephen O'Hara and Bruce A. Draper, "Scalable Action Recognition with a Subspace Forest," IEEE Conference on Computer Vision and Pattern Recognition, 2012. This source code is provided without warranty and is available under the GPL license. More commercially-friendly licenses may be available. Please contact Stephen O'Hara for license options. Please view the wiki on this site for installation instructions and examples on reproducing the results of the papers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    PyArmadillo

    PyArmadillo

    linear algebra library for Python

    PyArmadillo - streamlined linear algebra library for Python, with emphasis on ease of use. Alternative to NumPy / SciPy. * Main page: https://pyarma.sourceforge.io * Documentation: https://pyarma.sourceforge.io/docs.html * Bug reports: https://pyarma.sourceforge.io/faq.html * Git repo: https://gitlab.com/jason-rumengan/pyarma
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PyCls

    PyCls

    Codebase for Image Classification Research, written in PyTorch

    pycls is a focused PyTorch codebase for image classification research that emphasizes reproducibility and strong, transparent baselines. It popularized families like RegNet and supports classic architectures (ResNet, ResNeXt) with clean implementations and consistent training recipes. The repository includes highly tuned schedules, augmentations, and regularization settings that make it straightforward to match reported accuracy without guesswork. Distributed training and mixed precision are first-class, enabling fast experiments on multi-GPU setups with simple, declarative configs. Model definitions are concise and modular, making it easy to prototype new blocks or swap backbones while keeping the rest of the pipeline unchanged. Pretrained weights and evaluation scripts cover common datasets, and the logging/metric stack is designed for quick comparison across runs. Practitioners use pycls both as a baseline factory and as a scaffold for new classification backbones.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PyTorch SimCLR

    PyTorch SimCLR

    PyTorch implementation of SimCLR: A Simple Framework

    For quite some time now, we know about the benefits of transfer learning in Computer Vision (CV) applications. Nowadays, pre-trained Deep Convolution Neural Networks (DCNNs) are the first go-to pre-solutions to learn a new task. These large models are trained on huge supervised corpora, like the ImageNet. And most important, their features are known to adapt well to new problems. This is particularly interesting when annotated training data is scarce. In situations like this, we take the models’ pre-trained weights, append a new classifier layer on top of it, and retrain the network. This is called transfer learning, and is one of the most used techniques in CV. Aside from a few tricks when performing fine-tuning (if the case), it has been shown (many times) that if training for a new task, models initialized with pre-trained weights tend to learn faster and be more accurate then training from scratch using random initialization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    The Python Computer Vision Framework is an opened project deisgned for all those interested in computer vision. It aims at making computer vision more easy and structured and matlab-free. It may also be used for other artistic and scientific areas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Raster Vision

    Raster Vision

    Open source framework for deep learning satellite and aerial imagery

    Raster Vision is an open source framework for Python developers building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery). There is built-in support for chip classification, object detection, and semantic segmentation using PyTorch. Raster Vision allows engineers to quickly and repeatably configure pipelines that go through core components of a machine learning workflow: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. The input to a Raster Vision pipeline is a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. The output of a Raster Vision pipeline is a model bundle that allows you to easily utilize models in various deployment scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    SAHI

    SAHI

    A lightweight vision library for performing large object detection

    A lightweight vision library for performing large-scale object detection & instance segmentation. Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage. Here comes the SAHI to help developers overcome these real-world problems with many vision utilities. Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Savant

    Python Computer Vision & Video Analytics Framework With Batteries Incl

    Savant is an open-source, high-level framework for building real-time, streaming, highly efficient multimedia AI applications on the Nvidia stack. It helps to develop dynamic, fault-tolerant inference pipelines that utilize the best Nvidia approaches for data center and edge accelerators. Savant is built on DeepStream and provides a high-level abstraction layer for building inference pipelines. It is designed to be easy to use, flexible, and scalable. It is a great choice for building smart CV and video analytics applications for cities, retail, manufacturing, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    VGGT

    VGGT

    [CVPR 2025 Best Paper Award] VGGT

    VGGT is a transformer-based framework aimed at unifying classic visual geometry tasks—such as depth estimation, camera pose recovery, point tracking, and correspondence—under a single model. Rather than training separate networks per task, it shares an encoder and leverages geometric heads/decoders to infer structure and motion from images or short clips. The design emphasizes consistent geometric reasoning: outputs from one head (e.g., correspondences or tracks) reinforce others (e.g., pose or depth), making the system more robust to challenging viewpoints and textures. The repo provides inference pipelines to estimate geometry from monocular inputs, stereo pairs, or brief sequences, together with evaluation harnesses for common geometry benchmarks. Training utilities highlight data curation and augmentations that preserve geometric cues while improving generalization across scenes and cameras.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ViAmI-Server

    ViAmI-Server

    Pattern recognition for ADL events

    This software uses computer vision algorithms for mining sequence data from telemonitoring data with CBRs. We propose an approach which treats the detection of changes in behavior detected with a sensor/video fusion, which occur at radically different time-scales, through a CBR in two levels: low and high level. The system is always updating the database with the daily data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    fastai

    fastai

    Deep learning library

    fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. It is built on top of a hierarchy of lower-level APIs which provide composable building blocks.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB