Summary and Schedule
This is a new lesson built with The Carpentries Workbench.
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction to Salmon Knowledge Modelling | What are controlled vocabularies and why are they important for data interoperability? |
| Duration: 00h 12m | 2. Reusing Terms — Search and Integrate Existing Vocabularies |
Are the terms I need already defined somewhere else? How can I responsibly reuse existing terms and URIs? What are the benefits of aligning early rather than reinventing? |
| Duration: 01h 47m | 3. Documenting Terms — Write Clear, Useful Definitions |
How can I make sure others understand and correctly use my
terms? What makes a good definition or label? How should I record units, examples, and relationships between terms? |
| Duration: 03h 22m | 4. Concept Decomposition |
What are the components that make up a concept? How do I tell when two terms are the same, related, or overlapping? What patterns or relationships exist among my documented terms? How can I show these relationships clearly? |
| Duration: 04h 54m | 5. From Concepts to Semantics — Introducing SKOS |
How do we move from lists of terms and definitions to formal,
machine-readable vocabularies? What does it mean to give a term a URI and define its relationships to others? How can SKOS help represent our concepts and mappings in a structured, shareable way? How do hierarchical relationships (“broader”, “narrower”, “related”) clarify meaning and enable interoperability? |
| Duration: 06h 14m | 6. From Terms to Meaning - Framing Knowledge with Competency Questions | What is a Competency Question (CQ) and how does it help in ontology development? |
| Duration: 07h 46m | 7. Bonus Session |
What is an ontology, and how does it differ from a data
dictionary? Why does salmon research data need clearer semantics? What challenges arise when different people organize the same vocabulary? |
| Duration: 09h 19m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Pre-Workshop Reading List
To get the most out of this workshop, we recommend reviewing the following materials before attending. The readings are organized by priority and will help you understand the foundational concepts we’ll be building upon.
Required Readings (2)
These essential readings provide the core foundation for understanding data mobilization and standards:
-
The FAIR Guiding Principles for scientific data management and stewardship
- Wilkinson, M.D. et al. (2016). Scientific Data 3, 160018
- https://www.nature.com/articles/sdata201618
- Why it’s important: This is the foundational paper that introduced FAIR principles (Findable, Accessible, Interoperable, Reusable) - the cornerstone of modern data mobilization
-
Practical Data Stewardship for Salmon Biologists–A Blueprint for Domain-Specific Best Practices in Fisheries
- Johnson, B. et al. (2024). DRAFT manuscript
- https://br-johnson.github.io/sdm-paper/
- Why it’s important: This pre-print provides seven practical best practices specifically for salmon data stewardship, with real-world examples and case studies from the salmon research community
Highly Recommended Readings (8)
These readings will deepen your understanding of key concepts:
-
Data Mobilization Through the International Year of the Salmon Ocean Observing System
- Johnson, B.T. and T.C.A. van der Stap (2024). N. Pac. Anadr. Fish Comm. Bull. 7: 51–60
- https://doi.org/10.23849/npafcb7/6a4ddpde4
- Why it’s important: Demonstrates large-scale, cross-jurisdictional data integration efforts in salmon science through the International Year of the Salmon program
-
Salmon Data Mobilization
- Diack, G., T. Bird, S.A. Akenhead, J. Bayer, D. Brophy, C. Bull, E. de Eyto, B.T. Johnson, M.B. Jones, A. Knight, M. Nevoux, T. van der Stap, and A. Walker (2024). N. Pac. Anadr. Fish Comm. Bull. 7: 61–76
- https://doi.org/10.23849/npafcb7/x3rlpo23a
- Why it’s important: Provides a comprehensive strategy for salmon data mobilization across three spheres of agencies and practitioners, with practical guidance for the salmon research community
-
Darwin Core: A Biodiversity Data Standard
- TDWG (Biodiversity Information Standards)
- https://dwc.tdwg.org/
- Why it’s important: Darwin Core is one of the most widely-used biological data standards and provides a concrete example of how controlled vocabularies work in practice
-
Climate and Forecast (CF) Metadata Conventions
- CF Conventions Committee
- http://cfconventions.org/
- Why it’s important: Shows how climate data is standardized, which is crucial for understanding environmental drivers of salmon populations
-
Controlled Vocabularies: A Guide to Terminology and Usage
- National Information Standards Organization (NISO)
- https://www.niso.org/publications/controlled-vocabularies-guide
- Why it’s important: Provides practical guidance on creating and using controlled vocabularies
-
Data Standards: A Crash Course
- Journal of eScience Librarianship
- https://publishing.escholarship.umassmed.edu/jeslib/article/id/758/print/
- Why it’s important: Accessible introduction to data standards and their importance for data sharing
-
Linked Data Vocabulary Management
- National Information Standards Organization (NISO)
- https://www.niso.org/niso-io/2012/06/linked-data-vocabulary-management
- Why it’s important: Explains how vocabularies are managed and versioned in practice
-
Towards a Shared Framework: A Classificatory Matrix for Teaching Data Standards
- Journal of eScience Librarianship
- https://publishing.escholarship.umassmed.edu/jeslib/article/id/758/print/
- Why it’s important: Provides a framework for understanding different types of data standards
Optional Readings (7)
For those who want to dive deeper into specific topics:
-
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data
- PMC (2024)
- https://pmc.ncbi.nlm.nih.gov/articles/PMC11327634/
- Focus: Machine-readable metadata and FAIR implementation
-
A Guide to Developing Harmonized Research Workflows in a Team Science Context
- PMC (2024)
- https://pmc.ncbi.nlm.nih.gov/articles/PMC12233188/
- Focus: Team science and metadata standards
-
Building a Unified Medical Vocabulary Framework Aligned with OMOP CDM
- Medium/SciForce (2024)
- https://medium.com/sciforce/building-a-unified-medical-vocabulary-framework-aligned-with-omop-cdm-b7a577b2316c
- Focus: Vocabulary frameworks and data models
-
Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets
- Frontiers in Marine Science (2021)
- https://www.frontiersin.org/articles/10.3389/fmars.2021.769629/full
- Focus: Marine data and ontologies
-
What is an Ontology?
- Stanford Encyclopedia of Philosophy
- https://plato.stanford.edu/entries/ontology/
- Focus: Philosophical foundations of ontologies
-
Principles of Data Interoperability
- Research Data Alliance
- https://www.rd-alliance.org/group/data-interoperability-principles-wg/outcomes/principles-data-interoperability
- Focus: Data interoperability principles
-
The Environment Ontology: Contextualising Biological and Biomedical Entities
- Journal of Biomedical Semantics (2013)
- https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-43
- Focus: ENVO development and applications
Data Sets
Download the data zip file and unzip it to your Desktop
Software Setup
This workshop will use several tools for data mobilization, controlled vocabularies, and knowledge modeling. Please install the following software before attending: