Data used for research can be defined as the recorded factual material that is commonly accepted in the scientific community as information that is required to validate research findings. There are four major categorical types of data for where the data comes from: observational; experimental; simulated and derived. There are also three different types of data formats, structured, unstructured, and semi-structured. The term "data" does not have one clear definition, and is often interpreted differently by many depending on their field of study.
Observational data takes information drawn from studies that focus on observing particular subjects or phenomena and interpreting the results. When working with specific subjects, observational data can be collected from a treatment group as well as a controlled group to infer differences between the two. Examples of types of research that use observational data include cross-sectional and longitudinal studies.
Experimental data most commonly include results from laboratory studies – especially the measurements taken during these studies. For example, these measurements can be taken from chemical reactions or from a field study where controlled behavioral analysis was undertaken. This type of data requires rigorous documentation.
Simulation data, also commonly referred to as computational data represents information gathered from generating a computer model or simulation. Simulation data can be generated from studies in physics, or from virtual reality experiments – to name a couple of examples.
Derived data can most aptly be described as data that has been generated from pre-existing data. Derived data often takes data that has already been collected, and modifies or adds value to it in order to create an entirely new interpretation of the data. An example of derived data would be when a researcher takes previously collected phenotype data, and combines it with newly generated genotype data. This combination creates a new dataset that was derived from previously collected data.
Structured data is tabular data that can be easily analyzed by a computer and is usually found in spreadsheets or databases. Unstructured data is usually in the form of text, images, audio, and video that are more difficult for a machine to analyze and usually researchers need to add structure to make analysis easier. Semi-structured data is in-between structured and unstructured data which does not conform to a strict structure but has indicators from which machines can derive meaning. One example of semi-structured data is Extensible Markup Language (XML).
Borgman CL. (2010). Research Data: Who will share what, with whom, when, and why?(link is external) China-North American Library Conference [Internet]. Beijing, 21.
Gandomi A, Haider M. (2015). Beyond the hype: Big data concepts, methods, and analytics(link is external). International Journal of Information Management, 35, 137–144. doi.org/10.1016/j.ijinfomgt.2014.10.007
Partlo K. (2014). From Data to the Creation of Meaning Part II: Data Librarian as Translator(link is external). IASSIST Quarterly, 38(2), 12–15.