feat: integrate LFQ and TMT workflows#27
Conversation
📝 WalkthroughWalkthroughThe PR introduces a configurable ChangesLFQ/TMT Dual-Mode Analysis
Sequence DiagramsequenceDiagram
actor User
participant WorkflowTest
participant load_abundance_data
participant results_helpers
participant ResultsPage
participant PathwayPage
participant MyGene
rect rgba(100, 149, 237, 0.5)
Note over User,WorkflowTest: Workflow configuration
User->>WorkflowTest: select analysis-mode (LFQ or TMT)
WorkflowTest->>WorkflowTest: render_lfq_tabs() or render_tmt_tabs()
WorkflowTest->>WorkflowTest: execution() → LFQ or TMT pipeline
end
rect rgba(144, 238, 144, 0.5)
Note over ResultsPage,results_helpers: Results rendering
ResultsPage->>load_abundance_data: load with analysis-mode
load_abundance_data->>results_helpers: LFQ path (mzML-group- map, Welch t-test, BH)
load_abundance_data->>results_helpers: TMT path (TMT-group- map, skip channels, BH)
results_helpers-->>ResultsPage: pivot_df, group_map, expr_matrix
ResultsPage->>ResultsPage: render volcano/PCA/heatmap per mode
end
rect rgba(255, 165, 0, 0.5)
Note over PathwayPage,MyGene: GO enrichment
PathwayPage->>load_abundance_data: load pivot_df
PathwayPage->>MyGene: query UniProt IDs → GO annotations
MyGene-->>PathwayPage: GO term sets (BP/CC/MF)
PathwayPage->>PathwayPage: Fisher exact test → top terms
PathwayPage->>PathwayPage: write go_results.json, render charts
end
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (7)
content/results_pathway_analysis.py (1)
68-69: 💤 Low valueUse pandas boolean negation instead of
!= Truecomparison.Per static analysis, comparing to
Truewith!=is not recommended. For pandas boolean columns, use the bitwise negation operator.♻️ Proposed fix
res_go = pd.DataFrame(res_list) if "notfound" in res_go.columns: - res_go = res_go[res_go["notfound"] != True] + res_go = res_go[~res_go["notfound"].fillna(False)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@content/results_pathway_analysis.py` around lines 68 - 69, In the filtering operation on the res_go dataframe where the "notfound" column is checked, replace the `!= True` comparison with the bitwise negation operator `~` to properly filter pandas boolean columns. This change makes the code more idiomatic and follows pandas best practices for boolean indexing.src/common/results_helpers.py (5)
337-338: 💤 Low valueRedundant
Path()wrapping and duplicateParameterManager.
workflow_diris already aPath(from line 198). Also, aParameterManagerwas created at line 201 and can be reused here instead of creating another instance.Proposed fix
- parameter_manager = ParameterManager(Path(workflow_dir), "TOPP Workflow") - params = parameter_manager.get_parameters_from_json() + params = parameter_manager.get_parameters_from_json()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/results_helpers.py` around lines 337 - 338, The ParameterManager instantiation at line 337 is redundant because a ParameterManager instance was already created at line 201 and should be reused. Additionally, workflow_dir is already a Path object from line 198, so wrapping it with Path() again is unnecessary. Remove the Path() wrapper around workflow_dir and replace the new ParameterManager instantiation with a reference to the existing parameter_manager instance created earlier to obtain the parameters from JSON.
225-226: 💤 Low valueRedundant
ParameterManagerinstantiation.A
ParameterManagerwas already created at line 201 and assigned toparameter_manager. This creates a duplicate instance with a different workflow name argument (missing here).Proposed fix
- # Get group mapping from parameters - param_manager = ParameterManager(workflow_dir) - params = param_manager.get_parameters_from_json() + # Get group mapping from parameters (reuse existing parameter_manager) + params = parameter_manager.get_parameters_from_json()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/results_helpers.py` around lines 225 - 226, The ParameterManager is being instantiated twice - once as parameter_manager at line 201 and again as param_manager at lines 225-226. Remove the redundant ParameterManager instantiation and instead reuse the existing parameter_manager variable that was already created earlier in the code, then update the get_parameters_from_json() call to use the existing parameter_manager instance.
323-326: ⚡ Quick winCatch specific CSV parsing exceptions for TMT branch.
Same issue as the LFQ branch - bare
Exceptioncatch masks debugging.Proposed fix
try: df = pd.read_csv(csv_file, sep="\t", comment="#", engine="python") - except Exception: + except (pd.errors.ParserError, pd.errors.EmptyDataError, FileNotFoundError): return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/results_helpers.py` around lines 323 - 326, The exception handling in the pd.read_csv call within the TMT branch function is too broad by catching a bare Exception, which masks actual parsing errors and complicates debugging. Replace the generic Exception catch with specific exceptions that pd.read_csv can raise, such as FileNotFoundError for missing files and pd.errors.ParserError for parsing issues. This will allow proper error handling and logging of specific failure modes while still gracefully returning None when appropriate.Source: Linters/SAST tools
216-219: ⚡ Quick winCatch specific CSV parsing exceptions.
Catching bare
Exceptionmasks unexpected errors and makes debugging harder. For CSV parsing, consider catchingpd.errors.ParserErrorandpd.errors.EmptyDataError.Proposed fix
try: df = pd.read_csv(csv_file) - except Exception: + except (pd.errors.ParserError, pd.errors.EmptyDataError, FileNotFoundError): return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/results_helpers.py` around lines 216 - 219, The try-except block around pd.read_csv(csv_file) is catching a bare Exception which masks unexpected errors and makes debugging difficult. Replace the generic Exception catch with specific pandas exceptions that are likely to occur during CSV parsing. Catch pd.errors.ParserError and pd.errors.EmptyDataError specifically, either in a single except clause with both exceptions or using multiple except blocks, while allowing other unexpected exceptions to propagate and surface actual errors.Source: Linters/SAST tools
379-482: ⚖️ Poor tradeoffRedundant
group_mapcheck and unclear control flow.The check
if group_map:at line 379 is redundant since line 365-367 already returnsNonewhengroup_mapis empty. The nested structure and fallthrough to line 483 makes the control flow hard to follow.Consider flattening:
Proposed refactor
if not group_map: st.warning("⚠️ Group mapping information is missing. Please configure sample groups in the Setup page.") return None + + if len(set(group_map.values())) < 2: + st.warning("⚠️ At least two distinct groups are required for statistical analysis.") + return None if exclude_indices: # ... existing column dropping logic ... - - if group_map: - # ... all the nested logic ... - if group_map and len(set(group_map.values())) >= 2: - # ... stats computation ... - return pivot_df, expr_df, group_map - else: - st.warning("⚠️ At least two distinct groups are required for statistical analysis.") - else: - st.warning("⚠️ No group mapping information is set. Please check the Configure page.") - return None + # ... stats computation (now at top level) ... + return pivot_df, expr_df, group_map🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/results_helpers.py` around lines 379 - 482, The redundant check for group_map at line 379 combined with the nested if condition at line 410 creates unclear control flow. Since the code already returns None at lines 365-367 when group_map is empty, the outer if group_map check is unnecessary. Remove the entire outer if group_map block starting at line 379 and dedent all the contained code. Replace it with just the inner condition checking if group_map has at least two distinct groups, and remove the trailing else clause that warns about no group mapping since that case is already handled by the earlier return statement. This will flatten the control structure and eliminate the redundant check.content/results_volcano.py (1)
43-192: ⚖️ Poor tradeoffConsider extracting shared volcano plot logic.
The LFQ and non-LFQ branches duplicate ~50 lines of nearly identical code. The only meaningful differences are the protein column name, slider minimum, and layout style. Consider extracting the shared logic into a helper function.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@content/results_volcano.py` around lines 43 - 192, Extract the shared volcano plot creation logic into a helper function to eliminate code duplication between the LFQ and non-LFQ branches. The helper function should accept parameters for the differences: the protein column name (ProteinName vs protein), the fc_thresh minimum value (0.5 vs 0.1), and the layout style parameter (use_container_width vs width). Move the common code that creates volcano_df, calculates neg_log10_padj, sets Significance categories, creates the scatter plot with fig_volcano, adds threshold lines, sets x-axis range symmetry, updates layout, displays counts, and shows navigation links into this helper function. Then replace both the if and else branches with calls to this helper function, passing the appropriate parameter values for each analysis mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@content/results_abundance.py`:
- Around line 74-76: The code in the lambda function within the apply method on
pivot_df uses np.log2() but the numpy module is not imported at the top of the
file, causing a NameError when this code executes. Add import numpy as np to the
import statements at the top of the results_abundance.py file to resolve the
undefined np reference.
In `@content/results_heatmap.py`:
- Line 122: The st.plotly_chart function call contains an invalid width
parameter set to "stretch". Remove the width="stretch" parameter from the
st.plotly_chart function call, as this is not a supported parameter value for
this function.
- Around line 41-132: The code contains nearly complete duplication between the
if analysis_mode == "LFQ" and else branches, with all the logic for slider
creation with key "heatmap_top_n", variance calculation using var(), top
proteins selection, heatmap normalization and z-score calculation, hierarchical
clustering with linkage() and leaves_list(), and figure creation with
px.imshow() being identical. Extract all this common code outside the
conditional statement to execute once before the if-else block, and only keep
the different st.plotly_chart() parameter call (use_container_width=True versus
width="stretch") within the conditional, or consolidate that parameter as well
if they achieve the same result.
In `@content/results_pathway_analysis.py`:
- Line 18: The page title in the st.title() function call does not reflect the
actual purpose of this page, which is Pathway Analysis. Change the title from
"ProteomicsLFQ Results" to a more appropriate title that clearly indicates this
is the Pathway Analysis page, such as "Pathway Analysis Results" or similar, to
align the page title with its actual functionality and avoid confusing users.
- Around line 191-197: The _run_go_enrichment() function is called
unconditionally on every page load, which causes unnecessary external API calls
to MyGene.info even when cached results already exist in go_results.json. Add a
check before calling _run_go_enrichment() to verify whether the go_json_file
already exists and contains valid cached results. Only execute
_run_go_enrichment() when the cached results file does not exist or is invalid,
allowing subsequent page loads to skip the expensive computation and reuse the
previously saved results.
In `@content/results_pca.py`:
- Line 110: The st.plotly_chart function call in the results_pca.py file
contains an invalid width parameter with value "stretch". Remove the
width="stretch" argument from the st.plotly_chart call entirely, or if you need
the chart to span the full container width, replace it with the valid parameter
use_container_width=True instead.
- Around line 112-113: There is a duplicate st.markdown call displaying the
"Proteins used" message in the results_pca.py file. Remove one of the two
identical st.markdown statements that display the proteins count and top N
filter information. Keep only one instance of the line that constructs the
message with expr_df_pca.shape[0] and top_n variables.
In `@content/results_volcano.py`:
- Around line 31-33: The st.info() and st.stop() function calls have excessive
indentation that incorrectly nests them deeper than the if pivot_df.empty check.
Remove one level of indentation from both the st.info("No data available for
volcano plot.") and st.stop() lines so they align with the if statement at the
same indentation level, ensuring these statements execute as the direct body of
the if condition.
- Line 180: The st.plotly_chart() call with the fig_volcano argument is using an
invalid width="stretch" parameter that is not supported in Streamlit 1.43.0.
Replace the width="stretch" parameter with use_container_width=True to properly
stretch the chart to fill the container width.
In `@src/workflow/ParameterManager.py`:
- Line 133: In the get_merged_params method, update the type annotation for the
ini_params parameter to explicitly indicate it can be None by changing it from
dict = None to dict | None = None. This makes the optional nature of the
parameter explicit to satisfy type checking requirements.
In `@src/workflow/StreamlitUI.py`:
- Around line 836-840: The issue is that widget keys now contain the
tool_instance_name (e.g., CometAdapter-TMT:1:...), and when
ParameterManager.save_parameters() extracts the tool name from these keys and
calls create_ini(tool), it passes the instance name instead of the base tool
name, causing it to look for non-existent files like CometAdapter-TMT.ini and
skip saving parameter overrides. In the ParameterManager.save_parameters()
method, before calling create_ini(tool), resolve the instance name back to its
base tool name by extracting just the tool portion (the part before the :1: or
instance identifier) so that it correctly loads and saves to the actual ini file
for that tool.
- Around line 872-877: In the bool branch where isinstance(p["value"], bool) is
already confirmed on line 872, the subsequent type check for string
(type(p["value"]) == str) is unreachable and will never be true. Simplify the
bool_value assignment by directly using p["value"] since it is guaranteed to be
a boolean at that point, removing the unnecessary conditional logic that checks
if it's a string.
- Around line 616-621: In the input_TOPP function signature, replace the mutable
default arguments with None and add explicit Optional type annotations. Change
flag_parameters from List[str] = [] to Optional[List[str]] = None, change
custom_defaults from dict = {} to Optional[dict] = None, and ensure
tool_instance_name has the proper Optional[str] type annotation. Then, inside
the function body, add initialization logic that checks if each parameter is
None and initializes them to their intended empty values (empty list for
flag_parameters, empty dict for custom_defaults) only when needed.
In `@src/WorkflowTest.py`:
- Around line 675-690: The WorkflowTest.py file contains unresolved Git merge
conflict markers that prevent the code from being parsed. You need to manually
resolve all merge conflicts in the execution() method by examining each conflict
region (marked by <<<<<<< HEAD, =======, and >>>>>>> upstream/main) and choosing
the correct code from either branch or merging the logic appropriately. Delete
all conflict marker lines after resolving each section, ensuring the final code
maintains proper syntax and logic flow. The conflicts occur in multiple
locations throughout the file including directory setup, CometAdapter execution,
ProteomicsLFQ input handling, report generation, and the return statement.
- Around line 390-391: Remove the debug statement
st.write(Path(self.workflow_dir, "results")) from the TMT tab configuration UI
code. This line is displaying raw path information to users and should be
deleted entirely as it serves no functional purpose in production.
- Around line 425-436: The custom_defaults dictionary contains a duplicate key
"PeptideIndexing:unmatched_action" that appears on both line 425 and line 436,
where the second occurrence silently overwrites the first. Remove one of the
duplicate "PeptideIndexing:unmatched_action": "warn" entries from the dictionary
to eliminate the copy-paste error.
---
Nitpick comments:
In `@content/results_pathway_analysis.py`:
- Around line 68-69: In the filtering operation on the res_go dataframe where
the "notfound" column is checked, replace the `!= True` comparison with the
bitwise negation operator `~` to properly filter pandas boolean columns. This
change makes the code more idiomatic and follows pandas best practices for
boolean indexing.
In `@content/results_volcano.py`:
- Around line 43-192: Extract the shared volcano plot creation logic into a
helper function to eliminate code duplication between the LFQ and non-LFQ
branches. The helper function should accept parameters for the differences: the
protein column name (ProteinName vs protein), the fc_thresh minimum value (0.5
vs 0.1), and the layout style parameter (use_container_width vs width). Move the
common code that creates volcano_df, calculates neg_log10_padj, sets
Significance categories, creates the scatter plot with fig_volcano, adds
threshold lines, sets x-axis range symmetry, updates layout, displays counts,
and shows navigation links into this helper function. Then replace both the if
and else branches with calls to this helper function, passing the appropriate
parameter values for each analysis mode.
In `@src/common/results_helpers.py`:
- Around line 337-338: The ParameterManager instantiation at line 337 is
redundant because a ParameterManager instance was already created at line 201
and should be reused. Additionally, workflow_dir is already a Path object from
line 198, so wrapping it with Path() again is unnecessary. Remove the Path()
wrapper around workflow_dir and replace the new ParameterManager instantiation
with a reference to the existing parameter_manager instance created earlier to
obtain the parameters from JSON.
- Around line 225-226: The ParameterManager is being instantiated twice - once
as parameter_manager at line 201 and again as param_manager at lines 225-226.
Remove the redundant ParameterManager instantiation and instead reuse the
existing parameter_manager variable that was already created earlier in the
code, then update the get_parameters_from_json() call to use the existing
parameter_manager instance.
- Around line 323-326: The exception handling in the pd.read_csv call within the
TMT branch function is too broad by catching a bare Exception, which masks
actual parsing errors and complicates debugging. Replace the generic Exception
catch with specific exceptions that pd.read_csv can raise, such as
FileNotFoundError for missing files and pd.errors.ParserError for parsing
issues. This will allow proper error handling and logging of specific failure
modes while still gracefully returning None when appropriate.
- Around line 216-219: The try-except block around pd.read_csv(csv_file) is
catching a bare Exception which masks unexpected errors and makes debugging
difficult. Replace the generic Exception catch with specific pandas exceptions
that are likely to occur during CSV parsing. Catch pd.errors.ParserError and
pd.errors.EmptyDataError specifically, either in a single except clause with
both exceptions or using multiple except blocks, while allowing other unexpected
exceptions to propagate and surface actual errors.
- Around line 379-482: The redundant check for group_map at line 379 combined
with the nested if condition at line 410 creates unclear control flow. Since the
code already returns None at lines 365-367 when group_map is empty, the outer if
group_map check is unnecessary. Remove the entire outer if group_map block
starting at line 379 and dedent all the contained code. Replace it with just the
inner condition checking if group_map has at least two distinct groups, and
remove the trailing else clause that warns about no group mapping since that
case is already handled by the earlier return statement. This will flatten the
control structure and eliminate the redundant check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 08db369f-2cbf-4c06-981c-b3365985aa6a
📒 Files selected for processing (10)
app.pycontent/results_abundance.pycontent/results_heatmap.pycontent/results_pathway_analysis.pycontent/results_pca.pycontent/results_volcano.pysrc/WorkflowTest.pysrc/common/results_helpers.pysrc/workflow/ParameterManager.pysrc/workflow/StreamlitUI.py
| pivot_df["Intensity"] = pivot_df[sample_cols].apply( | ||
| lambda row: [np.log2(v + 1) for v in row], axis=1 | ||
| ) |
There was a problem hiding this comment.
np is undefined - missing numpy import.
The non-LFQ branch uses np.log2() but numpy is not imported in this file, causing a NameError at runtime when TMT mode is used.
Proposed fix - add import at top of file
+import numpy as np
from src.workflow.ParameterManager import ParameterManager🧰 Tools
🪛 GitHub Actions: Pylint / 0_build.txt
[error] 75-75: pylint E0602: Undefined variable 'np' (undefined-variable)
🪛 GitHub Actions: Pylint / build
[error] 75-75: pylint E0602: Undefined variable 'np' (undefined-variable)
🪛 Ruff (0.15.17)
[error] 75-75: Undefined name np
(F821)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/results_abundance.py` around lines 74 - 76, The code in the lambda
function within the apply method on pivot_df uses np.log2() but the numpy module
is not imported at the top of the file, causing a NameError when this code
executes. Add import numpy as np to the import statements at the top of the
results_abundance.py file to resolve the undefined np reference.
Source: Linters/SAST tools
| if analysis_mode == "LFQ": | ||
| top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n") | ||
|
|
||
| heatmap_clustered = heatmap_z.iloc[row_order, col_order] | ||
| var_series = expr_df.var(axis=1) | ||
| top_proteins = var_series.sort_values(ascending=False).head(top_n).index | ||
| heatmap_df = expr_df.loc[top_proteins] | ||
| heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0) | ||
| heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna() | ||
|
|
||
| fig_heatmap = px.imshow( | ||
| heatmap_clustered, | ||
| labels=dict(x="Sample", y="Protein", color="Z-score"), | ||
| aspect="auto", | ||
| color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]], | ||
| zmin=-3, zmax=3 | ||
| ) | ||
| if not heatmap_z.empty: | ||
| row_linkage = linkage(pdist(heatmap_z.values), method="average") | ||
| row_order = leaves_list(row_linkage) | ||
|
|
||
| fig_heatmap.update_layout( | ||
| height=700, | ||
| xaxis={'side': 'bottom'}, | ||
| yaxis={'side': 'left'} | ||
| ) | ||
| col_linkage = linkage(pdist(heatmap_z.T.values), method="average") | ||
| col_order = leaves_list(col_linkage) | ||
|
|
||
| fig_heatmap.update_xaxes(tickfont=dict(size=10)) | ||
| fig_heatmap.update_yaxes(tickfont=dict(size=8)) | ||
| heatmap_clustered = heatmap_z.iloc[row_order, col_order] | ||
|
|
||
| st.plotly_chart(fig_heatmap, use_container_width=True) | ||
| fig_heatmap = px.imshow( | ||
| heatmap_clustered, | ||
| labels=dict(x="Sample", y="Protein", color="Z-score"), | ||
| aspect="auto", | ||
| color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]], | ||
| zmin=-3, zmax=3 | ||
| ) | ||
|
|
||
| fig_heatmap.update_layout( | ||
| height=700, | ||
| xaxis={'side': 'bottom'}, | ||
| yaxis={'side': 'left'} | ||
| ) | ||
|
|
||
| fig_heatmap.update_xaxes(tickfont=dict(size=10)) | ||
| fig_heatmap.update_yaxes(tickfont=dict(size=8)) | ||
|
|
||
| st.plotly_chart(fig_heatmap, use_container_width=True) | ||
| else: | ||
| st.warning("Insufficient data to generate the heatmap.") | ||
|
|
||
| st.markdown("---") | ||
| st.markdown("**Other visualizations:**") | ||
| col1, col2 = st.columns(2) | ||
| with col1: | ||
| st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋") | ||
| with col2: | ||
| st.page_link("content/results_pca.py", label="PCA", icon="📊") | ||
| else: | ||
| st.warning("Insufficient data to generate the heatmap.") | ||
|
|
||
| st.markdown("---") | ||
| st.markdown("**Other visualizations:**") | ||
| col1, col2 = st.columns(2) | ||
| with col1: | ||
| st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋") | ||
| with col2: | ||
| st.page_link("content/results_pca.py", label="PCA", icon="📊") | ||
| top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n") | ||
|
|
||
| var_series = expr_df.var(axis=1) | ||
| top_proteins = var_series.sort_values(ascending=False).head(top_n).index | ||
| heatmap_df = expr_df.loc[top_proteins] | ||
| heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0) | ||
| heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna() | ||
|
|
||
| if not heatmap_z.empty: | ||
| row_linkage = linkage(pdist(heatmap_z.values), method="average") | ||
| row_order = leaves_list(row_linkage) | ||
|
|
||
| col_linkage = linkage(pdist(heatmap_z.T.values), method="average") | ||
| col_order = leaves_list(col_linkage) | ||
|
|
||
| heatmap_clustered = heatmap_z.iloc[row_order, col_order] | ||
|
|
||
| fig_heatmap = px.imshow( | ||
| heatmap_clustered, | ||
| labels=dict(x="Sample", y="Protein", color="Z-score"), | ||
| aspect="auto", | ||
| color_continuous_scale=[[0.0, "#3b6fb6"], [0.5, "white"], [1.0, "#b40426"]], | ||
| zmin=-3, zmax=3 | ||
| ) | ||
|
|
||
| fig_heatmap.update_layout( | ||
| height=700, | ||
| xaxis={'side': 'bottom'}, | ||
| yaxis={'side': 'left'} | ||
| ) | ||
|
|
||
| fig_heatmap.update_xaxes(tickfont=dict(size=10)) | ||
| fig_heatmap.update_yaxes(tickfont=dict(size=8)) | ||
|
|
||
| st.plotly_chart(fig_heatmap, width="stretch") | ||
| else: | ||
| st.warning("Insufficient data to generate the heatmap.") | ||
|
|
||
| st.markdown("---") | ||
| st.markdown("**Other visualizations:**") | ||
| col1, col2 = st.columns(2) | ||
| with col1: | ||
| st.page_link("content/results_volcano.py", label="Volcano Plot", icon="🌋") | ||
| with col2: | ||
| st.page_link("content/results_pca.py", label="PCA", icon="📊") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Eliminate near-complete code duplication.
The LFQ and non-LFQ branches (lines 41-86 vs 87-132) are nearly identical - the only difference is the chart rendering call. This should be a single code path.
Proposed refactor
-if analysis_mode == "LFQ":
- top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
- # ... 40+ lines of identical code ...
- st.plotly_chart(fig_heatmap, use_container_width=True)
- # ... footer links ...
-else:
- top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
- # ... 40+ lines of identical code ...
- st.plotly_chart(fig_heatmap, width="stretch")
- # ... footer links ...
+top_n = st.slider("Number of proteins", 20, 200, 50, key="heatmap_top_n")
+
+var_series = expr_df.var(axis=1)
+top_proteins = var_series.sort_values(ascending=False).head(top_n).index
+heatmap_df = expr_df.loc[top_proteins]
+heatmap_z = heatmap_df.sub(heatmap_df.mean(axis=1), axis=0).div(heatmap_df.std(axis=1), axis=0)
+heatmap_z = heatmap_z.replace([np.inf, -np.inf], np.nan).dropna()
+
+if not heatmap_z.empty:
+ # ... clustering and figure creation ...
+ st.plotly_chart(fig_heatmap, use_container_width=True)
+else:
+ st.warning("Insufficient data to generate the heatmap.")
+
+st.markdown("---")
+st.markdown("**Other visualizations:**")
+# ... links ...🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/results_heatmap.py` around lines 41 - 132, The code contains nearly
complete duplication between the if analysis_mode == "LFQ" and else branches,
with all the logic for slider creation with key "heatmap_top_n", variance
calculation using var(), top proteins selection, heatmap normalization and
z-score calculation, hierarchical clustering with linkage() and leaves_list(),
and figure creation with px.imshow() being identical. Extract all this common
code outside the conditional statement to execute once before the if-else block,
and only keep the different st.plotly_chart() parameter call
(use_container_width=True versus width="stretch") within the conditional, or
consolidate that parameter as well if they achieve the same result.
| fig_heatmap.update_xaxes(tickfont=dict(size=10)) | ||
| fig_heatmap.update_yaxes(tickfont=dict(size=8)) | ||
|
|
||
| st.plotly_chart(fig_heatmap, width="stretch") |
There was a problem hiding this comment.
Invalid width parameter for st.plotly_chart.
Same issue as other pages - width="stretch" is not valid.
Proposed fix
- st.plotly_chart(fig_heatmap, width="stretch")
+ st.plotly_chart(fig_heatmap, use_container_width=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| st.plotly_chart(fig_heatmap, width="stretch") | |
| st.plotly_chart(fig_heatmap, use_container_width=True) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/results_heatmap.py` at line 122, The st.plotly_chart function call
contains an invalid width parameter set to "stretch". Remove the width="stretch"
parameter from the st.plotly_chart function call, as this is not a supported
parameter value for this function.
| # Page setup | ||
| # ================================ | ||
| params = page_setup() | ||
| st.title("ProteomicsLFQ Results") |
There was a problem hiding this comment.
Page title does not match page purpose.
The page title is "ProteomicsLFQ Results" but this is the Pathway Analysis page. This creates user confusion.
🧹 Proposed fix
params = page_setup()
-st.title("ProteomicsLFQ Results")
+st.title("Pathway Analysis")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/results_pathway_analysis.py` at line 18, The page title in the
st.title() function call does not reflect the actual purpose of this page, which
is Pathway Analysis. Change the title from "ProteomicsLFQ Results" to a more
appropriate title that clearly indicates this is the Pathway Analysis page, such
as "Pathway Analysis Results" or similar, to align the page title with its
actual functionality and avoid confusing users.
| go_json_file = results_dir / "go-terms" / "go_results.json" | ||
|
|
||
| go_input_df = pivot_df.copy() | ||
| if "ProteinName" in go_input_df.columns: | ||
| go_input_df = go_input_df.rename(columns={"ProteinName": "protein"}) | ||
|
|
||
| _run_go_enrichment(go_input_df, results_dir) |
There was a problem hiding this comment.
GO enrichment runs on every page load, even when results already exist.
_run_go_enrichment() is called unconditionally every time the page loads (line 197), which triggers an external API call to MyGene.info and re-computes the entire analysis. Since results are saved to go_results.json, the function should skip execution when valid cached results exist.
⚡ Proposed fix
go_json_file = results_dir / "go-terms" / "go_results.json"
+# Only run GO enrichment if results don't already exist
+if not go_json_file.exists():
go_input_df = pivot_df.copy()
if "ProteinName" in go_input_df.columns:
go_input_df = go_input_df.rename(columns={"ProteinName": "protein"})
_run_go_enrichment(go_input_df, results_dir)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/results_pathway_analysis.py` around lines 191 - 197, The
_run_go_enrichment() function is called unconditionally on every page load,
which causes unnecessary external API calls to MyGene.info even when cached
results already exist in go_results.json. Add a check before calling
_run_go_enrichment() to verify whether the go_json_file already exists and
contains valid cached results. Only execute _run_go_enrichment() when the cached
results file does not exist or is invalid, allowing subsequent page loads to
skip the expensive computation and reuse the previously saved results.
| # get key and name – use tool_instance_name in session state key | ||
| key_str = p['key'].decode() | ||
| if tool_instance_name != topp_tool_name: | ||
| key_str = key_str.replace(f"{topp_tool_name}:1:", f"{tool_instance_name}:1:", 1) | ||
| key = f"{self.parameter_manager.topp_param_prefix}{key_str}" |
There was a problem hiding this comment.
Resolve instance names before saving TOPP parameter overrides.
These widget keys now persist as ...{tool_instance_name}:1:...; ParameterManager.save_parameters() derives the tool name from that key and calls create_ini(tool). For instances such as CometAdapter-TMT, it attempts to create/load a non-existent CometAdapter-TMT.ini and skips saving user overrides, so TMT parameter edits are lost.
🐛 Proposed fix in `ParameterManager.save_parameters()`
- # for each TOPP tool, open the ini file
- for tool in current_topp_tools:
- if not self.create_ini(tool):
+ # for each TOPP tool instance, open the real executable's ini file
+ tool_instance_map = st.session_state.get("_topp_tool_instance_map", {})
+ for tool_instance in current_topp_tools:
+ tool = tool_instance_map.get(tool_instance, tool_instance)
+ if not self.create_ini(tool):
# Could not create ini file - skip this tool
continue
ini_path = Path(self.ini_dir, f"{tool}.ini")
- if tool not in json_params:
- json_params[tool] = {}
+ if tool_instance not in json_params:
+ json_params[tool_instance] = {}
@@
- if key.startswith(f"{self.topp_param_prefix}{tool}:1:"):
+ if key.startswith(f"{self.topp_param_prefix}{tool_instance}:1:"):
@@
- ini_key = key.replace(self.topp_param_prefix, "").encode()
+ ini_key_str = key.replace(self.topp_param_prefix, "")
+ if tool_instance != tool:
+ ini_key_str = ini_key_str.replace(
+ f"{tool_instance}:1:",
+ f"{tool}:1:",
+ 1,
+ )
+ ini_key = ini_key_str.encode()
@@
- or (key.split(":1:")[1] in json_params[tool])
+ or (key.split(":1:")[1] in json_params[tool_instance])
or (is_list_param and not value) # Always save empty list params
):
# store non-default value
- json_params[tool][key.split(":1:")[1]] = value
+ json_params[tool_instance][key.split(":1:")[1]] = value🧰 Tools
🪛 Ruff (0.15.17)
[warning] 836-836: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/workflow/StreamlitUI.py` around lines 836 - 840, The issue is that widget
keys now contain the tool_instance_name (e.g., CometAdapter-TMT:1:...), and when
ParameterManager.save_parameters() extracts the tool name from these keys and
calls create_ini(tool), it passes the instance name instead of the base tool
name, causing it to look for non-existent files like CometAdapter-TMT.ini and
skip saving parameter overrides. In the ParameterManager.save_parameters()
method, before calling create_ini(tool), resolve the instance name back to its
base tool name by extracting just the tool portion (the part before the :1: or
instance identifier) so that it correctly loads and saves to the actual ini file
for that tool.
| elif isinstance(p["value"], bool): | ||
| bool_value = ( | ||
| (p["value"] == "true") | ||
| if type(p["value"]) == str | ||
| else p["value"] | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify direct type comparisons flagged by Ruff E721 are removed.
# Expect: no matches after switching to `isinstance()` or removing unreachable checks.
rg -nP 'type\(.+\)\s*==\s*str' src/workflow/StreamlitUI.pyRepository: OpenMS/quantms-web
Length of output: 121
🏁 Script executed:
sed -n '865,885p' src/workflow/StreamlitUI.py | cat -nRepository: OpenMS/quantms-web
Length of output: 1193
Remove the unreachable string type check in the bool branch.
The isinstance(p["value"], bool) check on line 872 guarantees that p["value"] is a bool at that point. The subsequent type(p["value"]) == str check is unreachable and will never be true, making the conditional unnecessary.
🔧 Proposed fix
# bools
elif isinstance(p["value"], bool):
- bool_value = (
- (p["value"] == "true")
- if type(p["value"]) == str
- else p["value"]
- )
+ bool_value = p["value"]
cols[i].selectbox(🧰 Tools
🪛 Ruff (0.15.17)
[error] 875-875: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/workflow/StreamlitUI.py` around lines 872 - 877, In the bool branch where
isinstance(p["value"], bool) is already confirmed on line 872, the subsequent
type check for string (type(p["value"]) == str) is unreachable and will never be
true. Simplify the bool_value assignment by directly using p["value"] since it
is guaranteed to be a boolean at that point, removing the unnecessary
conditional logic that checks if it's a string.
Source: Linters/SAST tools
| st.write(Path(self.workflow_dir, "results")) | ||
|
|
There was a problem hiding this comment.
Remove debug statement.
st.write(Path(self.workflow_dir, "results")) appears to be a debug statement that was left in the TMT tab configuration UI. This will display raw path information to users in production.
🧹 Proposed fix
st.info(comet_info)
- st.write(Path(self.workflow_dir, "results"))
-
comet_include = [":enzyme", "missed_cleavages", "fixed_modifications", "variable_modifications",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/WorkflowTest.py` around lines 390 - 391, Remove the debug statement
st.write(Path(self.workflow_dir, "results")) from the TMT tab configuration UI
code. This line is displaying raw path information to users and should be
deleted entirely as it serves no functional purpose in production.
| "PeptideIndexing:unmatched_action": "warn", | ||
| "max_variable_mods_in_peptide": 3, | ||
| "precursor_mass_tolerance": 4.5, | ||
| "isotope_error": "0/1", | ||
| "precursor_error_units": "ppm", | ||
| "num_hits": 1, | ||
| "num_enzyme_termini": "fully", | ||
| "fragment_bin_offset": 0.0, | ||
| "minimum_peaks": 10, | ||
| "precursor_charge": "2:4", | ||
| "fragment_mass_tolerance": 0.015, | ||
| "PeptideIndexing:unmatched_action": "warn", |
There was a problem hiding this comment.
Duplicate dictionary key overwrites earlier value.
"PeptideIndexing:unmatched_action": "warn" is specified twice in the custom_defaults dict (lines 425 and 436). The second occurrence silently overwrites the first. While both have the same value here, this indicates copy-paste error and could cause subtle bugs if values diverge.
🧹 Proposed fix
"precursor_error_units": "ppm",
"num_hits": 1,
"num_enzyme_termini": "fully",
"fragment_bin_offset": 0.0,
"minimum_peaks": 10,
"precursor_charge": "2:4",
"fragment_mass_tolerance": 0.015,
- "PeptideIndexing:unmatched_action": "warn",
"variable_modifications": "Oxidation (M)\nAcetyl (Protein N-term)\nTMT6plex (K)\nTMT6plex (N-term)",
"debug": 0,
"force": True,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "PeptideIndexing:unmatched_action": "warn", | |
| "max_variable_mods_in_peptide": 3, | |
| "precursor_mass_tolerance": 4.5, | |
| "isotope_error": "0/1", | |
| "precursor_error_units": "ppm", | |
| "num_hits": 1, | |
| "num_enzyme_termini": "fully", | |
| "fragment_bin_offset": 0.0, | |
| "minimum_peaks": 10, | |
| "precursor_charge": "2:4", | |
| "fragment_mass_tolerance": 0.015, | |
| "PeptideIndexing:unmatched_action": "warn", | |
| "PeptideIndexing:unmatched_action": "warn", | |
| "max_variable_mods_in_peptide": 3, | |
| "precursor_mass_tolerance": 4.5, | |
| "isotope_error": "0/1", | |
| "precursor_error_units": "ppm", | |
| "num_hits": 1, | |
| "num_enzyme_termini": "fully", | |
| "fragment_bin_offset": 0.0, | |
| "minimum_peaks": 10, | |
| "precursor_charge": "2:4", | |
| "fragment_mass_tolerance": 0.015, | |
| "variable_modifications": "Oxidation (M)\nAcetyl (Protein N-term)\nTMT6plex (K)\nTMT6plex (N-term)", | |
| "debug": 0, | |
| "force": True, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/WorkflowTest.py` around lines 425 - 436, The custom_defaults dictionary
contains a duplicate key "PeptideIndexing:unmatched_action" that appears on both
line 425 and line 436, where the second occurrence silently overwrites the
first. Remove one of the duplicate "PeptideIndexing:unmatched_action": "warn"
entries from the dictionary to eliminate the copy-paste error.
| <<<<<<< HEAD | ||
| if current_mode == "LFQ": | ||
| self.logger.log("⚙️ Running LFQ workflow") | ||
|
|
||
| # ================================ | ||
| # 1️⃣ Directory setup | ||
| # ================================ | ||
| results_dir = Path(self.workflow_dir, "results") | ||
| comet_dir = results_dir / "comet_results" | ||
| perc_dir = results_dir / "percolator_results" | ||
| filter_dir = results_dir / "psm_filter" | ||
| quant_dir = results_dir / "quant_results" | ||
| ======= | ||
| for d in [comet_dir, perc_dir, filter_dir, quant_dir]: | ||
| d.mkdir(parents=True, exist_ok=True) | ||
| >>>>>>> upstream/main |
There was a problem hiding this comment.
Unresolved merge conflict markers break the entire file.
The execution() method contains multiple unresolved Git merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> upstream/main) that cause syntax errors and prevent the code from being parsed. The pipeline is failing with syntax-error at line 675.
All conflict markers must be resolved before this PR can be merged. Key conflict regions:
- Lines 675-704: Directory setup and initial processing
- Lines 730-738: CometAdapter execution
- Lines 1122-1135: ProteomicsLFQ input handling
- Lines 1549-1578: Final report and TMT workflow branching
- Lines 2027-2180: Return statement and GO enrichment method
Also applies to: 694-704, 730-738, 1122-1135, 1549-1561, 1570-1578, 2027-2027, 2180-2180
🧰 Tools
🪛 GitHub Actions: Pylint / 0_build.txt
[error] 675-675: pylint E0001: Parsing failed: 'invalid syntax (WorkflowTest, line 675)' (syntax-error)
🪛 GitHub Actions: Pylint / build
[error] 675-675: pylint E0001: Parsing failed: 'invalid syntax (WorkflowTest, line 675)' (syntax-error)
🪛 Ruff (0.15.17)
[warning] 675-675: Expected a statement
(invalid-syntax)
[warning] 675-675: Expected a statement
(invalid-syntax)
[warning] 675-675: Expected a statement
(invalid-syntax)
[warning] 675-675: Expected a statement
(invalid-syntax)
[warning] 676-676: Unexpected indentation
(invalid-syntax)
[warning] 687-687: Expected a statement
(invalid-syntax)
[warning] 687-687: Expected a statement
(invalid-syntax)
[warning] 687-687: Expected a statement
(invalid-syntax)
[warning] 687-687: Expected a statement
(invalid-syntax)
[warning] 687-688: Expected a statement
(invalid-syntax)
[warning] 688-688: Unexpected indentation
(invalid-syntax)
[warning] 690-690: Expected a statement
(invalid-syntax)
[warning] 690-690: Expected a statement
(invalid-syntax)
[warning] 690-690: Expected a statement
(invalid-syntax)
[warning] 690-690: Expected a statement
(invalid-syntax)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/WorkflowTest.py` around lines 675 - 690, The WorkflowTest.py file
contains unresolved Git merge conflict markers that prevent the code from being
parsed. You need to manually resolve all merge conflicts in the execution()
method by examining each conflict region (marked by <<<<<<< HEAD, =======, and
>>>>>>> upstream/main) and choosing the correct code from either branch or
merging the logic appropriately. Delete all conflict marker lines after
resolving each section, ensuring the final code maintains proper syntax and
logic flow. The conflicts occur in multiple locations throughout the file
including directory setup, CometAdapter execution, ProteomicsLFQ input handling,
report generation, and the return statement.
Source: Linters/SAST tools
Changes
Unified LFQ/TMT Workflow (
feat)configure()andexecution()to dynamically render parameters and execute the appropriate workflow based on the selected analysis mode.Result Visualization Integration (
feat)Added interactive downstream analysis pages for:
Connected visualization modules to both LFQ and TMT outputs for streamlined result exploration.
Related Issues
Testing Checklist
Summary by CodeRabbit
New Features
Bug Fixes