

A list of supported tags is maintained in the tags.yaml file in this repo. Select tags that are related to an intrinsic property or descriptor of the dataset. For an example why, check out the Managed By section of the TARGET datasetĪn explanation of how frequently the dataset is updated

If your institution manages several datasets hosted by the Public Dataset Program, please list the managing institution identically. The name of the laboratory, institution, or organization who is responsible for the data ingest process. May be an email address, a link to contact form, a link to GitHub issues page, or any other instructions to contact the producer of the dataset Only the first 600 characters will be displayed on the homepage of the Registry of Open Data on AWSĪ link to documentation of the dataset, preferably hosted on the data provider's website or Github repository.

Must be between 5 and 130 characters.Ī high-level description of the dataset. We do not require "AWS" or "Open Data" to be in the dataset name. The metadata required for each dataset entry is as follows: Field A hosted YAML file listing all of the dataset entries.A Registry of Open Data on AWS browser.We use these YAML files to provide three services: How are datasets added to the registry?Įach dataset in this repository is described with metadata saved in a YAML file in the /datasets directory. This repository exists to help people promote and discover datasets that are available via AWS resources. Sharing data in the cloud lets data users spend more time on data analysis rather than data acquisition.

When data is shared on AWS, anyone can analyze it and build services on top of it using a broad range of compute and data analytics products, including Amazon EC2, Amazon Athena, AWS Lambda, and Amazon EMR. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. A repository of publicly available datasets that are available for access from AWS resources.
