Geek's Portal For Computers Graphics Operating Systems Multi-Media Networking Programming Data Format and  News
[ Start Page ] [ Contacting ] [ About ] [ Link To Us ] [ Geek Gear ] Thu, Aug 7 2008 
Free Internet Tools by web-geek.com Internet Tools
Administrator Tools
Name Server Look Up
Ping Test
Who Is
Trace Route

Web Developer Tools
Web Safe Colors
HTML Character Map
PopUp Generator
Body Color CSS v1.0
Browser Information
Meta Tag Generator
Keywords Generator
Link Popularity
JavaScript Escape / Unescape Converter
JavaScript Drop Down Menu Builder
Web / Virtual Hosting Directory

Reference Documention
HTML 4.0 Reference

Reference Tables
Character Conversion Table
Domain Name Suffixes

Cheat Sheets
Vi / Vim Basic Commands


WEB-GEEK.COM's Feature Sites Feature Sites
oGobogo Internet Search Directory
News.web-geek.com Internet News Directory
Pdawebgeek.com PDA Friendly Web Directory
Games.web-geek.com Free Online Games



folder Directories
Top > Computers > Artificial Intelligence > Machine Learning > Datasets
Bilkent University Function Approximation Repository Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.
The StatLib Datasets Archive A repository of datasets used in statistics and machine learning.
National Space Science Data Center Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
TechTC - Technion Repository of Text Categorization Datasets Provides a large number of diverse test collections for use in text categorization research.
TREC Data Text datasets used in information retrieval and learning in text domains.
Learning Relational Concepts from Sensor Data of a Mobile Robot A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
Time Series Data Library A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.
Penn Treebank Project A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms.
Web->KB dataset Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
Face recognition dataset A dataset of face images for face recognition algorithms.
WordSimilarity-353 Test Collection Contains 353 English word pairs along with human-assigned similarity judgements.
DELVE - Data for Evaluating Learning in Valid Experiments Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
Dataset generator Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
Reuters-21578 Text Categorization Corpus A classic benchmark for text categorization algorithms.
UCI Machine Learning Repository A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
RISE: Repository of Information Sources used in information Extraction tasks. Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
NIST Special Database 4. This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs.
The RCSB Protein Data Bank (PDB) Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
HS3D - Homo Sapiens Splice Sites Dataset HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization.
Sponsor Sponsor


  © 1999-2006, web-geek.com a Geek Boy Enterprises, Inc. website terms and conditions of use [ Start Page ] [ Contacting ] [ About ] [ Link To US ]