Summary and Setup
This is a new lesson built with The Carpentries Workbench.
Pre-Workshop Reading List
To get the most out of this workshop, we recommend reviewing the following materials before attending. The readings are organized by priority and will help you understand the foundational concepts we’ll be building upon.
Required Readings (2)
These essential readings provide the core foundation for understanding data mobilization and standards:
-
The FAIR Guiding Principles for scientific data management and stewardship
- Wilkinson, M.D. et al. (2016). Scientific Data 3, 160018
- https://www.nature.com/articles/sdata201618
- Why it’s important: This is the foundational paper that introduced FAIR principles (Findable, Accessible, Interoperable, Reusable) - the cornerstone of modern data mobilization
-
Practical Data Stewardship for Salmon Biologists–A Blueprint for Domain-Specific Best Practices in Fisheries
- Johnson, B. et al. (2024). DRAFT manuscript
- https://br-johnson.github.io/sdm-paper/
- Why it’s important: This pre-print provides seven practical best practices specifically for salmon data stewardship, with real-world examples and case studies from the salmon research community
Highly Recommended Readings (8)
These readings will deepen your understanding of key concepts:
-
Data Mobilization Through the International Year of the Salmon Ocean Observing System
- Johnson, B.T. and T.C.A. van der Stap (2024). N. Pac. Anadr. Fish Comm. Bull. 7: 51–60
- https://doi.org/10.23849/npafcb7/6a4ddpde4
- Why it’s important: Demonstrates large-scale, cross-jurisdictional data integration efforts in salmon science through the International Year of the Salmon program
-
Salmon Data Mobilization
- Diack, G., T. Bird, S.A. Akenhead, J. Bayer, D. Brophy, C. Bull, E. de Eyto, B.T. Johnson, M.B. Jones, A. Knight, M. Nevoux, T. van der Stap, and A. Walker (2024). N. Pac. Anadr. Fish Comm. Bull. 7: 61–76
- https://doi.org/10.23849/npafcb7/x3rlpo23a
- Why it’s important: Provides a comprehensive strategy for salmon data mobilization across three spheres of agencies and practitioners, with practical guidance for the salmon research community
-
Darwin Core: A Biodiversity Data Standard
- TDWG (Biodiversity Information Standards)
- https://dwc.tdwg.org/
- Why it’s important: Darwin Core is one of the most widely-used biological data standards and provides a concrete example of how controlled vocabularies work in practice
-
Climate and Forecast (CF) Metadata Conventions
- CF Conventions Committee
- http://cfconventions.org/
- Why it’s important: Shows how climate data is standardized, which is crucial for understanding environmental drivers of salmon populations
-
Controlled Vocabularies: A Guide to Terminology and Usage
- National Information Standards Organization (NISO)
- https://www.niso.org/publications/controlled-vocabularies-guide
- Why it’s important: Provides practical guidance on creating and using controlled vocabularies
-
Data Standards: A Crash Course
- Journal of eScience Librarianship
- https://publishing.escholarship.umassmed.edu/jeslib/article/id/758/print/
- Why it’s important: Accessible introduction to data standards and their importance for data sharing
-
Linked Data Vocabulary Management
- National Information Standards Organization (NISO)
- https://www.niso.org/niso-io/2012/06/linked-data-vocabulary-management
- Why it’s important: Explains how vocabularies are managed and versioned in practice
-
Towards a Shared Framework: A Classificatory Matrix for Teaching Data Standards
- Journal of eScience Librarianship
- https://publishing.escholarship.umassmed.edu/jeslib/article/id/758/print/
- Why it’s important: Provides a framework for understanding different types of data standards
Optional Readings (7)
For those who want to dive deeper into specific topics:
-
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data
- PMC (2024)
- https://pmc.ncbi.nlm.nih.gov/articles/PMC11327634/
- Focus: Machine-readable metadata and FAIR implementation
-
A Guide to Developing Harmonized Research Workflows in a Team Science Context
- PMC (2024)
- https://pmc.ncbi.nlm.nih.gov/articles/PMC12233188/
- Focus: Team science and metadata standards
-
Building a Unified Medical Vocabulary Framework Aligned with OMOP CDM
- Medium/SciForce (2024)
- https://medium.com/sciforce/building-a-unified-medical-vocabulary-framework-aligned-with-omop-cdm-b7a577b2316c
- Focus: Vocabulary frameworks and data models
-
Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets
- Frontiers in Marine Science (2021)
- https://www.frontiersin.org/articles/10.3389/fmars.2021.769629/full
- Focus: Marine data and ontologies
-
What is an Ontology?
- Stanford Encyclopedia of Philosophy
- https://plato.stanford.edu/entries/ontology/
- Focus: Philosophical foundations of ontologies
-
Principles of Data Interoperability
- Research Data Alliance
- https://www.rd-alliance.org/group/data-interoperability-principles-wg/outcomes/principles-data-interoperability
- Focus: Data interoperability principles
-
The Environment Ontology: Contextualising Biological and Biomedical Entities
- Journal of Biomedical Semantics (2013)
- https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-43
- Focus: ENVO development and applications
Data Sets
Download the data zip file and unzip it to your Desktop
Software Setup
This workshop will use several tools for data mobilization, controlled vocabularies, and knowledge modeling. Please install the following software before attending: