A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
A database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization.
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.
ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies.
Several hundred thousand economic time series, produced by the U.S. Government and distributed by the government in a variety of formats and media, have been put into a standard, highly efficient, easy-to- use form for personal computers.
The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection as well as logo retrieval methods on real-world images. It contains images, ground truth, annotations and evaluation scripts.
Datgen is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.