Materials-Data-Science-and-Informatics
diff --git a/‎docs/column_mapping.md‎
Lines changed: 161 additions & 0 deletions b/‎docs/column_mapping.md‎
Lines changed: 161 additions & 0 deletions
diff --git a/‎docs/refactoring/README.md‎
Lines changed: 117 additions & 28 deletions b/‎docs/refactoring/README.md‎
Lines changed: 117 additions & 28 deletions
@@ -0,0 +1,161 @@
+# HMC Community Survey 2021 - Column Mapping Guide
+
+This document provides a comprehensive mapping of the columns in the `responses_cleaned_mapped_to_publish.csv` dataset to their corresponding survey questions from the HMC Community Survey 2021.
+
+## Survey Overview
+
+The HMC Community Survey 2021 was conducted to understand research data management practices among researchers in the Helmholtz Association. The survey used a **dynamic questioning approach** where follow-up questions were shown based on previous answers, explaining the varying column counts per section.
+
+## Dataset Summary
+
+- **Actual columns in published CSV**: 263 columns
+- **Potential survey columns**: 305 columns (from survey design)
+- **Completed responses**: 631 responses
+- **Data file**: `responses_cleaned_mapped_to_publish.csv`
+
+## Survey Question Groups and Column Mappings
+
+### 1. **Personal Background (PERBG)** - 26 columns
+
+Characterizes survey respondents by their institutional affiliation, research field, scientific discipline, career level, and research experience.
+
+- `PERBG1/_` - Helmholtz center affiliation
+- `PERBG1/other` - Other center specification  
+- `PERBG2/_` - Helmholtz research field
+- `PERBG3/_` - Primary research area
+- `PERBG3/other` - Other research area
+- `PERBG3AGRI/_`, `PERBG3AGRI/other` - Agricultural sciences
+- `PERBG3BIO/_`, `PERBG3BIO/other` - Biological sciences
+- `PERBG3CHEM/_`, `PERBG3CHEM/other` - Chemistry
+- `PERBG3GEO/_`, `PERBG3GEO/other` - Earth sciences
+- `PERBG3ING/_`, `PERBG3ING/other` - Engineering sciences
+- `PERBG3LIFE/_`, `PERBG3LIFE/other` - Life sciences
+- `PERBG3MATH/_`, `PERBG3MATH/other` - Mathematics
+- `PERBG3MED/_`, `PERBG3MED/other` - Medical sciences
+- `PERBG3PHYS/_`, `PERBG3PHYS/other` - Physics
+- `PERBG3PSYCH/_`, `PERBG3PSYCH/other` - Psychology
+- `PERBG4/_` - Years working in research
+- `PERBG6/_`, `PERBG6/other` - Career level
+- `PERBG7/_` - ORCID ID availability
+- `PERBG8/_` - Familiarity with FAIR data guidelines
+
+### 2. **Research Data Properties (RSDP)** - 71 columns
+
+Characterizes the research data generated or used by respondents, including data sources, methods, tools, and formats.
+
+- `RSDP1/1A2`, `RSDP1/3A4` - Data origin (reused vs self-generated)
+- `RSDP1b/1` - Data origin (simulated vs experimental)
+- `RSDP1c/1` through `RSDP1c/11`, `RSDP1c/other` - Data generation methods (12 columns)
+- `RSDP2/1` through `RSDP2/6`, `RSDP2/other` - Data collection methods (7 columns)
+- `RSDP2b/1-1` through `RSDP2b/7-3` - Detailed data collection workflows (21 columns)
+- `RSDP3/1` through `RSDP3/15`, `RSDP3/other` - Data formats used (16 columns)
+- `RSDP4/_` - Data collection duration
+- `RSDP7/_` - Publication data volume estimation
+- `RSDP8/_` - Data processing time
+- `RSDP10/_` - Important software applications
+- `RSDP11/_` - Software application importance
+
+### 3. **Research Data Management Practices (RDMPR)** - 86 columns
+
+Focuses on research data storage routines, data annotation, documentation practices, and metadata handling.
+
+- `RDMPR1/1` through `RDMPR1/3`, `RDMPR1/0`, `RDMPR1/other` - Data storage locations (5 columns)
+- `RDMPR3/1` through `RDMPR3/3`, `RDMPR3/other`, `RDMPR3/0` - Documentation methods (5 columns)
+- `RDMPR4/_` - Structured documentation (yes/no)
+- `RDMPR5/_` - International standards usage
+- `RDMPR6/1` through `RDMPR6/26`, `RDMPR6/other` - Metadata categories collected (27 columns)
+- `RDMPR7/2` through `RDMPR7/9`, `RDMPR7/other` - Digital metadata documentation (9 columns)
+- `RDMPR8/2` through `RDMPR8/10` - Automated metadata collection (9 columns)
+- `RDMPR9/2` through `RDMPR9/10` - Manual metadata collection (9 columns)
+- `RDMPR10/1` through `RDMPR10/3` - Structured documentation motivations (3 columns)
+- `RDMPR11/0` through `RDMPR11/9`, `RDMPR11/other` - Metadata collection obstacles (11 columns)
+- `RDMPR12/1` through `RDMPR12/6`, `RDMPR12/0`, `RDMPR12/other` - International standards used (8 columns)
+
+### 4. **Data Publishing Practices (DTPUB)** - 62 columns
+
+Addresses respondents' experience in making research data publicly available, including motivations and challenges.
+
+- `DTPUB1b/1` through `DTPUB1b/3`, `DTPUB1b/other` - Data publishing methods (4 columns)
+- `DTPUB3/1` through `DTPUB3/7`, `DTPUB3/other` - Data publishing motivations (8 columns)
+- `DTPUB4a/0` through `DTPUB4a/7`, `DTPUB4a/other` - Data publishing obstacles (9 columns)
+- `DTPUB4b/0` through `DTPUB4b/7`, `DTPUB4b/other` - Barriers for non-publishers (9 columns)
+- `DTPUB5/1` through `DTPUB5/5` - Publishing percentage estimation (5 columns)
+- `DTPUB6/1` - Repository usage (1 column)
+- `DTPUB7/1`, `DTPUB7/21` through `DTPUB7/93`, `DTPUB7/other`, `DTPUB7/0` - Published metadata types (26 columns)
+
+### 5. **Services and Support Needs (SERVC)** - 12 columns
+
+Addresses respondents' perceived need for support in various topics of research data management and preferred service formats.
+
+- `SERVC1/1` through `SERVC1/9`, `SERVC1/other`, `SERVC1/0` - Support needs areas (11 columns)
+- `SERVC2/1` through `SERVC2/6` - Service format preferences (6 columns)
+
+### 6. **Technical/Administrative Columns** - 8 columns
+
+System-generated fields for survey administration and analysis.
+
+- `id` - Response identifier
+- `interviewtime/_` - Interview duration
+- `lastpage/_` - Last page reached in survey
+- `submitdate/_` - Submission timestamp
+
+## Survey Logic and Adaptive Questioning
+
+The survey implemented **conditional logic** where:
+- Questions were dynamically adapted to respondents' expertise levels
+- Follow-up questions appeared based on previous answers
+- Different paths were available for different experience levels
+- Not all respondents saw all questions
+
+This explains why there were 305 possible columns in the survey design, but the published dataset contains only 263 columns after data cleaning and anonymization.
+
+## Key Survey Focus Areas
+
+The survey particularly focused on understanding:
+
+1. **Current practices** in research data management
+2. **Metadata handling** and documentation approaches  
+3. **Data publishing behaviors** and motivations
+4. **Support needs** for FAIR data implementation
+5. **Barriers and obstacles** researchers face
+6. **Community-specific requirements** across six Helmholtz research fields
+
+## Research Fields Covered
+
+The survey covered all six Helmholtz research fields:
+- Aeronautics, Space, and Transport (AST)
+- Earth and Environment (E&E)
+- Energy
+- Health
+- Information
+- Matter
+
+## Data Collection Details
+
+- **Survey Period**: September to November 2021
+- **Total Responses**: 631 completed responses
+- **Implementation**: LimeSurvey platform
+- **Data Collection**: Fully anonymized
+- **Target Group**: Scientific staff across all Helmholtz research centers
+
+## Data Processing and Column Reduction
+
+The published dataset contains **263 columns** rather than the full 305 possible columns from the survey design. This reduction occurred during data processing for the following reasons:
+
+1. **Anonymization**: Institutional affiliation data and other identifying information was removed
+2. **Privacy protection**: Software names used by fewer than 4 respondents were anonymized
+3. **Data cleaning**: Empty or unused columns may have been filtered out
+4. **Conditional questions**: Some survey paths may not have generated responses, resulting in unused columns
+
+The report specifically mentions: "Before the data publication the following information was removed or anonymized from the survey data in order to prevent the identification of individuals: Any information – including that might reveal a respondent's institutional affiliation, Names of software that is used by less than 4 respondents, Any information about institutional repositories."
+
+## Usage Notes
+
+- Column headers use a hierarchical naming convention (GROUP/SUBQUESTION/OPTION)
+- Multiple choice questions have separate columns for each option
+- Rating scales and slider questions have numeric values
+- Free text responses were cleaned and categorized where applicable
+- The `/_` suffix typically indicates single-choice or numeric responses
+- Numbered suffixes (e.g., `/1`, `/2`) indicate multiple choice options
+
+This mapping enables researchers and analysts to understand the structure and content of the survey data for further analysis and visualization.
@@ -88,7 +88,7 @@ This documentation repository contains comprehensive information about the major
 
 ---
 
-## 🏗️ New Architecture Summary
+## 🏗️ Current Architecture Summary
 
 ### Before Refactoring
 ```
@@ -99,15 +99,31 @@ main.py (1,198 lines)
 └── Monolithic structure
 ```
 
-### After Refactoring  
+### Current Refactored Structure  
 ```
 survey_dashboard/
-├── config.py (124 lines)           # Configuration & Constants
-├── data_processor.py (396 lines)   # Data Operations
-├── widgets.py (168 lines)          # UI Widget Creation
-├── visualizations.py (352 lines)   # Chart Management  
-├── layout_manager.py (192 lines)   # Layout & Templates
-└── main.py (113 lines)             # Clean Orchestration
+├── core/                           # Core Business Logic
+│   ├── config.py                   # Configuration & Constants + HMC Colors
+│   ├── data.py                     # Data Operations & Processing
+│   └── charts.py                   # Chart Creation & Management
+├── ui/                             # User Interface Layer
+│   ├── widgets.py                  # UI Widget Factory
+│   ├── layout.py                   # Layout Management
+│   └── callbacks.py                # Interactive Callbacks
+├── i18n/                          # Internationalization
+│   └── text_display.py            # Multilingual Text Content
+├── hmc_layout/                    # HMC-Specific Styling
+│   ├── hmc_colordicts.py          # Official HMC Color Palettes
+│   ├── hmc_custom_layout.py       # Custom CSS Styling
+│   ├── assets/                    # SVG Icons & Graphics
+│   └── static/                    # Static Web Assets
+├── data/                          # Data Storage & Configuration
+│   ├── hcs_clean_dictionaries.py # Survey Data Mappings
+│   ├── *.csv                      # Survey Dataset Files
+│   └── *.json                     # Additional Data Files
+├── app.py                         # Main Application Entry Point
+├── analysis.py                    # Statistical Analysis Functions
+└── plots.py                       # Core Plotting Functions
 ```
 
 ---
@@ -145,41 +161,85 @@ survey_dashboard/
 
 ## 🛠️ Module Responsibilities
 
-### `config.py` - Configuration Hub
+### Core Business Logic (`core/`)
+
+#### `core/config.py` - Configuration Hub
 - Global constants and environment variables
-- Color schemes and styling configuration
+- **HMC color schemes** and styling configuration
 - File paths and data source management
 - Widget options and template configuration
 
-### `data_processor.py` - Data Operations
+#### `core/data.py` - Data Operations Engine
 - CSV loading and preprocessing
 - Question mapping and translation
 - Data filtering and aggregation
 - Statistical calculations and transformations
 
-### `widgets.py` - UI Factory
-- Interactive widget creation
+#### `core/charts.py` - Chart Creation Manager
+- Overview, exploration, and correlation chart creation
+- Word cloud generation and management
+- Chart type selection and configuration
+- Visualization data preparation
+
+### User Interface Layer (`ui/`)
+
+#### `ui/widgets.py` - UI Widget Factory
+- Interactive widget creation (selectors, filters, controls)
 - Widget configuration and organization  
 - Control group management
 - Panel component generation
 
-### `visualizations.py` - Chart Engine
-- Chart creation and management
-- Visualization updates and callbacks
-- Word cloud generation
-- Interactive plot handling
-
-### `layout_manager.py` - Layout Controller
-- Dashboard layout assembly
-- Template integration and configuration
-- Responsive design implementation
+#### `ui/layout.py` - Layout Manager
+- Dashboard layout assembly and template integration
+- Accordion structure and responsive design
 - Section organization and styling
+- Template variable management
+
+#### `ui/callbacks.py` - Interactive Callbacks
+- Widget event handling and chart updates
+- User interaction management
+- Dynamic content updates
+
+### Internationalization (`i18n/`)
+
+#### `i18n/text_display.py` - Multilingual Content
+- Translatable text content (English/German)
+- UI labels and descriptions
+- Question text and tooltip content
+
+### HMC-Specific Styling (`hmc_layout/`)
+
+#### `hmc_layout/hmc_colordicts.py` - Official Color Palettes
+- **Helmholtz research hub colors** (Information, Health, Matter, etc.)
+- **HMC brand color palettes** for charts and visualizations
+- Color utility functions and matplotlib integration
+
+#### `hmc_layout/hmc_custom_layout.py` - Custom CSS Styling
+- Accordion and card styling
+- Responsive design CSS
+- Panel component customization
+
+### Data Layer (`data/`)
+
+#### `data/hcs_clean_dictionaries.py` - Survey Data Configuration
+- Survey question mappings and translations
+- Data type specifications and validation
+- Multiple choice question handling
 
-### `main.py` - Application Orchestrator
+### Application Entry Points
+
+#### `app.py` - Main Application Entry Point
 - Component initialization and dependency injection
 - Callback registration and event wiring  
-- Application startup flow
-- High-level coordination
+- Application startup flow and coordination
+
+#### `analysis.py` - Statistical Analysis Functions
+- Cross-tabulation and statistical calculations
+- Data aggregation and transformation utilities
+
+#### `plots.py` - Core Plotting Functions  
+- Bokeh-based chart creation utilities
+- Plot styling and configuration helpers
 
 ---
 
@@ -250,23 +310,52 @@ docs/refactoring/
 
 ---
 
+## 🎨 Recent Improvements & Features
+
+### HMC Branding Integration (September 2024)
+- ✅ **Official HMC Color Palettes** - Integrated Helmholtz research hub colors
+- ✅ **Chart Color Consistency** - All visualizations use official HMC branding
+- ✅ **Research Field Colors** - Specific colors for Information, Health, Matter, Energy, etc.
+- ✅ **Graceful Fallbacks** - Colors work with or without optional matplotlib dependencies
+
+### File Organization Improvements
+- ✅ **Internationalization Structure** - Moved `text_display.py` to dedicated `i18n/` directory
+- ✅ **HMC Layout Consolidation** - All styling components in `hmc_layout/` directory
+- ✅ **Data Structure Cleanup** - Survey mappings properly organized in `data/` directory
+- ✅ **Import Path Fixes** - Updated all import statements for new structure
+
+### Code Quality Enhancements
+- ✅ **Type Hints Added** - Improved static type checking with Pyright compatibility
+- ✅ **Error Handling** - Robust handling of optional dependencies
+- ✅ **Documentation Updates** - Comprehensive module documentation and examples
+
+### Developer Experience
+- ✅ **Column Mapping Guide** - Detailed documentation of 263 CSV columns to survey questions
+- ✅ **Data Verification** - Confirmed 631 responses match HMC report specifications
+- ✅ **Color Utility Functions** - Easy-to-use functions for getting HMC colors in charts
+
+---
+
 ## ❓ Frequently Asked Questions
 
 ### Q: Does the refactored version work exactly the same?
-**A:** Yes! All functionality is preserved. Users see no difference, but developers get a much better codebase.
+**A:** Yes! All functionality is preserved. Users see no difference, but developers get a much better codebase with official HMC branding.
 
 ### Q: Do I need to change deployment scripts?
 **A:** No changes needed. The same Panel serve command works exactly as before.
 
 ### Q: Can I still modify the dashboard?
-**A:** Yes, but it's now much easier! Check the [Developer Migration Guide](developer-migration-guide.md) for details.
+**A:** Yes, but it's now much easier! Check the [Developer Migration Guide](developer-migration-guide.md) for details. Plus you now have official HMC colors available.
 
 ### Q: What about performance?
 **A:** No performance impact. The modular structure may even be slightly faster due to better organization.
 
 ### Q: How do I add new features?
 **A:** Much easier now! Each type of change goes to its specific module. See the [Module Architecture](module-architecture.md) guide.
 
+### Q: How do I use the new HMC colors?
+**A:** Import from `hmc_layout.hmc_colordicts` - colors are automatically applied to charts, or use `get_hmc_colors(n)` for custom visualizations.
+
 ---
 
 ## 📞 Support