Experiment Databases for Machine Learning / BenchMarking Via Weka
published: Dec. 20, 2008, recorded: December 2008, views: 715
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Experiment Databases for Machine Learning
Experiment Databases for Machine Learning is a large public repository of machine learning experiments as well as a framework for producing similar databases for speciﬁc goals. This projects aims to bring the infor- mation contained in many machine learning experiments together and organize it a way that allows everyone to investigate how learning algorithms have performed in previous studies. To share such information with the world, a common language is proposed, dubbed ExpML, capturing the basic structure of a large range of machine learning experiments while remaining open for future extensions. This language also enforces reproducibility by requiring links to the used datasets and algorithms and by storing all details of the ex- periment setup. All stored information can then be accessed by querying the database, creating a powerful way to collect and reorganize the data, thus warranting a very thorough examination of the stored results. The current publicly available database contains over 500,000 classiﬁcation and regression experiments, and has both an online interface, at http://expdb.cs.kuleuven.be, as well as a stand-alone explorer tool oﬀering various visualization techniques. This framework can also be integrated in machine learning toolboxes to automatically stream results to a global (or local) experiment database, or to download experiments that have been run before.
BenchMarking Via Weka
BenchMarking Via Weka is a client-server architecture that supports interoperability between dierent machine learning systems. Machine learning systems need to provide mechanisms for processing data and evaluating generated models. In our system, the server hosts all the data and performs all the statistical analyses, while the client performs all the pre-processing and model building. This separation of tasks opens up the possibility of oering a cross-platform and cross-language framework. By performing statistical analyses on the host, we avoid unnecessary exchange and conversion of generated results.
Download slides: mloss08_reutemann_experiment_databases.pdf (1.2 MB)
Download slides: mloss08_reutemann_weka.pdf (302.6 KB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !