top of page

Cortex: The Biggest AI Dataset

Description

Problem

  • Deep neural networks can solve wide variety of machine learning problems well

  • Deep neural networks used in industry applications usually work the best when they are trained using supervised learning given that:

    • there is a lot of data available,

    • the training data is from the same distribution as the data from the production environment and 

    • the labels and data are of a high quality

  • Large amounts of data is available on and outside the Internet, but it is not useful for building machine learning solutions in raw format

logo.png.png
connect-on-rapidapi.png

Solution

  • Solving the process of:

    • Collecting large sets of data at the business process level

    • Preparing and labeling the data for use in training/evaluation processes of deep neural networks

    • Quality assurance of collected data and labels

  • Using multidisciplinary approach (technical, social, ethical, legal, ...)

  • Automating the process

  • Note - currently focused only around the image data for three computer vision tasks:

    • image classification, 

    • object detection and 

    • object segmentation

connect-on-rapidapi.png

Data Collection, Labeling and QA Process

connect-on-rapidapi.png

Mission Statement

To create the biggest high quality labeled dataset for building machine learning models.

Premium API endpoint /get-labeled-data that returns labeled data is available on RapidAPI - click below. 

connect-on-rapidapi.png

Current Dataset Statistics

FAQ

Does Cortex respect copyright laws?

 

Cortex dataset only references URLs to the original images. Images are scraped from Common Crawl database, temporarily stored in RAM and then discarded after labeling is done. Anyone using the dataset must download images they are interested in and suggestion is to use the img2dataset tool. URLs inserted through /upload endpoint are not exposed through /get-labeled-data endpoint if they can not be scraped from Common Crawl database.

Papers

Tools

Computer Vision Annotation Tool

cvat.png

Click on the icon!

Deep Data Analysis API

Deep Data Analysis API allows users to insert URL of image files (video, audio and text files soon) and receive deep data analysis. Deep Data Analysis ID is given to each file by using which user can get associated analysis.

Supported analysis types are:

  • object analysis

Planned analysis types are:

  • face analysis

  • pose estimation

API is asynchronous - once the file is indexed, analysis metadata can be obtained with subsequent request for the same file.

More modules for different types of analyses will be added periodically.

Below is the documentation for the free API endpoint /upload.

Computer Vision Visualization Tool

chrome_store.png

Click on the icon!

GitHub-Mark-120px-plus.png

Click on the icon!

visualization_tool.gif
bottom of page