What should I consider a ‘data set’ for purposes of recording in MEDIN?

Often it is difficult to decide if the data that has been collected constitutes one data set or many - this is called ‘granularity’. It is important to get the level or ‘granularity’ correct otherwise it is possible to end up with either too many or too few records which makes it difficult for a user to find what they want. MEDIN has some practical guidence to help you decide:

  • the correct level for a dataset is a cruise, survey or a set of repeat observations with a common purpose, 
  • a data set usually constitutes a specifically-funded piece of work, 
  • the dataset should be easily extractable from a database for a 3rd party,
  • if you are searching for a data set using a portal and get the result every time you search by different combinations of time, location      and parameter then it is probably too coarse.

To guide the metadata creator better some draft examples of what should be considered a dataset are given below:

  • A monitoring programme that produces data for the same parameters at the same locations each year
  • A multidisciplinary cruise that has been specifically funded to answer a specific research question and is not anticipated to be      carried out repeatedly
  • A number of different types of data collected over the course of 1 year in a specific location that forms an Environmental Impact      Assessment for a specific activity.
  • A survey carried out over 1 month in a Special Area of Conservation that has been funded as one piece of work.

It is difficult to be very prescriptive as often the decision of whether a collection of information forms 1 or more datasets is case specific. However, in all cases the metadata creator should ask him/herself what would be most effective for a user to quickly find to the information that they want via the portal.

What is a series?

Given the above definition of a dataset then a series is a collection of datasets which as INSPIRE define ‘are linked by a common specification’. In this case it is not believed that INSPIRE mean an Annex 1, 2 or 3 data specification but a common theme. Some draft suggestions of what constitutes a series is given below:

  • A collection of cruises that are linked by a common research question and so form part of a larger project (e.g. North Sea Project,      RAPID)
  • A project that has collected data from a range of sources to produce a large number of GIS layers across many topics (e.g.
         MB102  - Biophysical data layers).
  • A project that collects the same theme of data on a regular basis but at distinctly separate geographical locations each time (e.g.      MCA Civil Hydrography Programme)