For the moment list of all available data

Dataset Branch of science Description Contact
AFLOW materials science AFLOW is a database of more than 3 000 000 materials compounds, 500 000 000 calculated properties such as elastic properties, thermal properties... with online applications and documentations. Email
Air passengers dataset computer science The data set was donated to us by an unnamed company handling flight ticket reservations. The data is thin, it contains the date of departure, the departure airport, the arrival airport, etc. Email
American Mineralogist Crystal Structure Database materials science This site is an interface to a crystal structure database that includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals. Email
Arctic sea ice cover computer science, climatology The data is a time series of "images", consisting of different physical variables on a regular grid on the Earth, indexed by longitude and latitude coordinates. Email
BioNLP-ST 2013 Bacteria Biotopes information extraction, life sciences The knowledge tackled by this task is the habitats where bacteria live, and the environment properties of bacteria. This information is a particularly interesting in the fields of food processing and safety, health sciences and waste processing. Email
BioPortal biology, bio BioPortal SPARQL is a service to query BioMedical ontologies using the SPARQL standard. Ontologies have been transformed into RDF triples from their original formats (OWL, OBO and UMLS/RRF, ...) and asserted into a triple store. Email
Biomodels biology, bio BioModels linked data set contains all curated and non curated SBML models in the BioModels repository in RDF. Email
Biosamples biology, bio The BioSample Database (BioSD) is a database at European Bioinformatics Institute for the information about the biological samples used in DNA sequencing. Email
CSD-Community materials science List of crystallographic tools that contains data collection, validation and visualisation to teaching tools, research and analysis Email
Cell phenotyping computer science, medicine The data is from Samusik et al. where 38 surface markers (features) were measured in cells from the the bone marrow of healthy mice. The samples were analyzed and independently hand-gated by experts to identify 24 immune cell populations (classes). Email
ChEMBL biology, bio ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL). Email
Charged particle tracking in 2D with a possible future LHC Silicon detector computer science, physics The data provided is a list of hit positions from a simple toy detector model that mimics the Atlas detector design (which is generic enough for recent silicon-based tracking detectors). Email
Climate model simulation climatology Reference height temperature data Monthly Anomaly. Climate model simulation: model CCSM4 in post-industrial control conditions (piControl, r2ip1) Email
CoRot stellar physics, Exoplanet CoRoT was a CNES space mission dedicated to exoplanet search and stellar physics. The main products available in this archive are light curves.

These light curves, labelled N2 (detailed description), are ready for science use.||Email

Common Frame of Reference for European contract law private law, contract law, law This database containing relationships between european legal principles and national legal decisions has been as developed, from 2005 to 2008, by 150 researchers grouped in « The Joint Network on European Private Law », for the European Project CoPECL (FP6-CITIZENS-3) Email
Crystal Lattice Structures crystallography This page offers a concise index of common crystal lattice structures. A graphical representation as well as useful information about the lattices can be obtained by clicking on the desired structure. Email
Crystallography Open Database materials science, crystallography Open-access collection of crystal structures of organic, inorganic, metal-organics compounds and minerals, excluding biopolymers. Email
DAAP Lip(Sys)² analytical chemistry, Analytical chemistry, chemistry, chemical This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). The target is to bring together the research community in Analytical Chemistry. The first step is to reference the resources available to researchers, and then to share data. Email
DAAP PMM analytical chemistry, Analytical chemistry This wiki is a demonstrator for the project under construction DAAP (Data Acquisition For Analytical Platform). Its target is to bring together the research community in Analytical Chemistry. Email
DATABASES, REVIEWS AND BOOKS ONLINE document Focus Paris-Sud is the document search engine allowing you to access in a single search a large part of the documentation available at Paris-Sud University without distinction of support: books and e-books, book chapters , paper theses and online, journals. Email
DBPedia general knowledge A knowledge base extracted from Wikipedia Email
Dataset from the ATLAS Higgs Boson Machine Learning Challenge 2014 particle physics, machine learning The dataset has been built from official ATLAS simulation, with Higgs to tautau events mixed with different backgrounds. It has been used in the 2014 HiggsML challenge on Kaggle. It is hosted on the CERN Open Data Portal. Email
Drug Classification analytical chemistry, Analytical chemistry The dataset contain Raman spectra of 4 types of chemotherapeutic agents diluted in 9 different solutions, and having different concentrations. Measures were made by the Lip(Sys)². Email
EMBL-EBI resources biology, bio The European Bioinformatics Institute (EMBL-EBI) Platform aims to bring together the efforts of a number of EMBL-EBI resources that provide access to their data using Semantic Web technologies. It provides a unified way to query across resources using the W3C SPARQL query language. Email
Ensembl biology, bio Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Email
Epidemium cancer mortality rate prediction Cancer This dataset contains mortality rates of different cancer types for several geographic areas. Email
Epidemium cancer mortality prediction dataset computer science, medicine These datasets are about number of males/females living, number of people living in the geographic area, size of the geographic area and mortality rate from any type of cancer. Email
Expression Atlas biology, bio A powerful way to find information about gene and protein expression across species and biological conditions. It aims to help answering questions such as ‘where is a certain gene expressed?’ or ‘how does its expression change in a disease?’. Email
FactSage chemistry, chemical, physics, thermodynamics There is datas about substances, pure solutions, pure metals or metallic solutions, liquids ...

We can have calculations such as phase diagrams or pH potential, equilibrium or chemical reactions.||Email

FluidMeca virtual reality, fluid mechanics, HCI Fluid mechanics results including tags.Real time simulations benchmarks. Email
French National Library library, library organization The project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. Email
Gregorius canon law, legal history, Legal history of the Catholic Church Base de données en droit canonique Email
Grid Observatory 3.0 computer science The Grid Observatory 3.0 aims to publish the Grid Observatory and Green Computing Observatory data in an open and interoperable format in order to facilitate access to these data and the cross-analysis of these complementary data sources. Email
HESIOD cosmology The Herschel IdOc Database is delivering photometric maps and spectral cubes from the PACS and SPIRE instruments (IR domain), reprocessed at IAS with the latest ESA pipelines and with high level customized pipelines. Virtual Observatory compatible. Email
Hyperspectral classification toolkit planetary science, hyperspectral imaging Hyperspectral classification toolkit containing one hyperspectral cube example (from OMEGA instrument), reference spectral database and reference classification. This toolkit is done for classification test purpose. Email
IODS linked data List of open dataset in Email
IdRef authority control Vous trouverez les notices d’autorité IdRef et les références bibliographiques en provenance du Sudoc. Tous les types de notices d'autorité sont présents : Personnes, Collectivités, Noms Communs (Rameau et FMeSH), Noms géographiques, Familles et Titres. Email
LRI Information System computer science Scientists of the laboratory Email
Libraries of Paris-Saclay University culture List of libraries in the Paris-Saclay University. Email
Lipid modifications in J774 macrophages by vibrational spectroscopies analytical chemistry, Analytical chemistry, biology, bio Investigation of lipid modifications in J774 macrophages Email
MEDOC Solar Physics MEDOC (Multi Experiment Data & Operation Center) is a National Center for Space Solar Physics Data, approved by CNES, in the frame of an agreement between CNRS/INSU, Université Paris-Sud and CNES. MEDOC is located at Institut d'Astrophysique Spatiale in Orsay. Email
MINCRYST materials science Create the original combination consisting of the Crystal Structure Database for Minerals, the automatically formed Calculated Powder X-ray Diffraction Standards (CPDS) SubBase and the Applied Program Package using saved information for Powder X-Ray Diffraction and Crystal Chemical Analysis. Email
Madelon data science This is one of the datasets for the NIPS 2003 feature selection challenge. Email
Materials Project chemistry, chemical, physics, materials science Harnessing the power of supercomputing and state of the art electronic structure methods, the Materials Project provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials. Email
Materials Virtual Lab chemistry, chemical, material, laboratory The Materials Virtual Lab is a materials AI group focused on the cross-disciplinary application of machine learning to large materials data sets to accelerate materials design. It's not a proper database but it can be used to get to other databases. Email
Mineralogy Database materials science The Mineralogy Database was last updated on 9/5/2012 and it contains 4,714 individual mineral species descriptions with links and a comprehensive image library. Email
MoDALMI analytical chemistry, Analytical chemistry, lipidomics The goal of this project is to create a database in the analytical chemistry field for lipids, metabolites and isotopes, with an open access in an accessible common format, with metadata specifications. Email
Modified HiggsML dataset particle physics This dataset is a version of the HiggsML dataset, which contains a mixture of Higgs particles decaying into tau pairs and the principal background processes (800K events in total). Half of the events are unchanged, but the other half has been artificially distorted or corrupted in some way. Email
NIST-JANAF Thermochemical Tables chemistry, chemical, thermodynamics, thermochemistry NIST-JANAF gathers exhaustive informations about chemical elements or compounds. The database can be used in different ways. For example, one can type a chemical formula, in order to access specific informations, or an element of the periodic table to access all the possible compounds. Email
NOMAD Laboratory materials science Data on chemistry, chemical elements, crystallography and materials. Email
National Center for Atmospheric Research (NCAR) climatology, atmospheric chemistry The US National Center for Atmospheric Research studies meteorology, climate science, atmospheric chemistry, solar-terrestrial interactions, environmental and societal impacts. Email
National Institute of Standards and Technology (NIST) chemistry, chemical, science NIST produces the Nation’s Standard Reference Data (SRD). These data are assessed by experts and are trustworthy such that people can use the data with confidence and base significant decisions on the data. NIST provides 49 free SRD databases and 41 fee-based SRD databases. Email
Ontology Lookup Service biology, bio The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. Email
OrthoDB biology, bio Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Email
Template:EqualPLISonMars Planetary SUrface Portal (PSUP) planetary science This facility involves a data processing center coupled with planetary surface data dissemination center (mineralogical maps, geomorphologic maps, DTM...). Planetary SUrface Portal is an initiative from OSUPS and OSUL. Email
Pollinating insect classification (SPIPOLL) entomology, natural science The SPIPOLL (Suivi Photographique des Insectes POLLinisateurs) project proposes to quantitatively study pollinating insects in France. Email
Template:Equalquaero+broadcast+news Quaero Broadcast News Extended Named Entity corpus computer science The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). Email
Quaero French Medical Corpus computer science The QUAERO French Medical Corpus is a selection of MEDLINE titles and EMEA documents manually annotated as a resource for named entity recognition and normalization. It was used as a gold standard for French biomedical text in the CLEF eHealth evaluation lab in 2015 and 2016. Email
Template:Equalquaero+old+press Quaero Old Press Extended Named Entity corpus computer science Manual annotation of 76 newspaper issues of 1890-1891: Le Temps, La Croix and Le Figaro according to the Quaero extended and structured named entity definition.

Training: 231 pages, 1,297,742 words, 114,599 types, 136,113 components. Test: 64 pages, 363,455 words, 33,083 types, 40,432 components.||Email

Reactome biology, bio Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. Email
Reddit Public Comments (2007-10 through 2015-05) sociology ~1.7 billion JSON comment objects from complete with the comment, score, author, subreddit, position in comment tree and other fields that are available through Reddit's API. Email
SZ cluster database cosmology This database provides access to catalogues and complementary information on clusters of galaxies observed through the Sunyaev-Zeldovich (SZ) effect. This Planck SZ cluster catalogue is accessible on the Virtual Obervatory. Email
Scholarly Linked Open Data Semantic Web provides facilities and services to pubish you scholarly data as Linked Open Data Email
Semantic description of Debian packages computer science Semantic description of packages produced by the Debian projects Email
Sparql Score computer science SPARQLScore is an attempt to evaluate the conformance of triplestores to the W3C standards. Email
Synchrotron soleil physics, biology, bio Data about French national Synchrotron facility. Email
The MNIST database of handwritten digits computer science, artificial intelligence The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. Email
WikiPathways biology, bio WikiPathways is a Wiki for biological pathways. WikiPathways is intended to be an open, public space for content editing dedicated to biological pathways, facilitating the contribution and maintenance of pathway information from the scientific community. Email
Wikidata general knowledge, Semantic Web Wikidata is a free linked database for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource and others.

The content is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web.||Email

YAGO general knowledge A knowledge base extracted from Wikipedia, containing general knowledge about famous people, cities, countries, movies, organizations, etc, together with a taxonomy from WordNet. Email
efSUP_sem1 MOOC jeu de données pour de tests Email
free GRACE NLP French Literature text tagged with their POS Email
the Bilbao Crystallographic Server chemistry, chemical, crystallography Bilbao Crystallographic Server is an open access website offering online crystallographic database and programs aimed at analyzing, calculating and visualizing problems of structural and mathematical crystallography, solid state physics and structural chemistry Email
