Cortex: The Biggest AI Dataset
Cortex: The Biggest AI Dataset powered by OpenAI
Click on the icon!
Description
Problem
-
Deep neural networks can solve wide variety of machine learning problems well
-
Deep neural networks used in industry applications usually work the best when they are trained using supervised learning given that:
-
there is a lot of data available,
-
the training data is from the same distribution as the data from the production environment and
-
the labels and data are of a high quality
-
-
Large amounts of data is available on and outside the Internet, but it is not useful for building machine learning solutions in raw format
Solution
-
Solving the process of:
-
Collecting large sets of data at the business process level
-
Preparing and labeling the data for use in training/evaluation processes of deep neural networks
-
Quality assurance of collected data and labels
-
-
Using multidisciplinary approach (technical, social, ethical, legal, ...)
-
Automating the process
-
Note - currently focused only around the image data for computer vision tasks
Data Collection, Labeling and QA Process
Mission Statement
-
To create the biggest high quality labeled dataset for building machine learning models
-
To create a foundation dataset for artificial general intelligence
Premium API endpoint /get-labeled-data that returns labeled data is available on RapidAPI - click below.
Current Dataset Statistics
FAQ
Does Cortex respect copyright laws?
Cortex dataset only references URLs to the original images. Images are scraped from Common Crawl database, temporarily stored in RAM and then discarded after labeling is done. Anyone using the dataset must download images they are interested in and suggestion is to use the img2dataset tool. URLs inserted through /upload endpoint are not exposed through /get-labeled-data endpoint if they can not be scraped from Common Crawl database.
Papers
Tools
Computer Vision Annotation Tool
Click on the icon!
Deep Data Analysis API
Deep Data Analysis API allows users to insert URL of image files (video, audio and text files soon) and receive deep data analysis. Deep Data Analysis ID is given to each file by using which user can get associated analysis.
Supported analysis types are:
-
object analysis
Planned analysis types are:
-
face analysis
-
pose estimation
API is asynchronous - once the file is indexed, analysis metadata can be obtained with subsequent request for the same file.
More modules for different types of analyses will be added periodically.
Below is the documentation for the free API endpoint /upload.
Computer Vision Visualization Tool
Click on the icon!
Click on the icon!