⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Solution for #71#72

Open
CalvinFang-code wants to merge 3 commits intoBrentLab:devfrom
CalvinFang-code:dev
Open

Solution for #71#72
CalvinFang-code wants to merge 3 commits intoBrentLab:devfrom
CalvinFang-code:dev

Conversation

@CalvinFang-code
Copy link
Collaborator

I've made these improvements, hoping they will be useful:

  1. Added the _join_comparative_analyses function to _build_metadata_table to incorporate comparative datasets; _join_comparative_analyses queries the comparative dataset using SQL and then prepares for matching; _parse_composite_identifier: parses the ID from the comparative dataset for matching.
  2. It worked successfully with Harbison, but failed with Hackett due to a case mismatch (uppercase H). Therefore, I added code to _join_comparative_analyses to try both uppercase and lowercase beginnings for the repo ID.
  3. Added a query_dto function to specifically handle the intersection of specified binding and perturbation datasets.
  4. I also found some inconsistencies between datasets: BrentLab/harbison_2004;harbison_2004;3
    BrentLab/rossi_2021/rossi_2021_af_combined
    Some use semicolons, others use slashes.

Some use uppercase, others use lowercase:
BrentLab/Hackett_2020;hackett_2020;34 BrentLab/harbison_2004;harbison_2004;3

Do we need to unify them? Or should we handle them separately with functions?
5. I failed to read the calling cards data; the program crashed several times, but I haven't found the reason yet, so I haven't continued with the analysis.

@CalvinFang-code
Copy link
Collaborator Author

I find this strange; this problem didn't occur in my local testing. I'll investigate what's causing this later.

Copy link
Member

@cmatKhan cmatKhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is this isn't worth working on further. It is making very large changes and its hard for me to follow why some of them are being made.

I would suggest that rather than continuing on with this, it would be better to take that issue and make reproducible examples of how the current parsing method fails.

# Concatenate results, filling NaN for missing columns
return pd.concat(results, ignore_index=True, sort=False)

def query_dto(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions in virtualDB shouldn't be this specific. From the point of view of how the data is stored, DTO isn't meaningfully different from spearman correlation

@CalvinFang-code
Copy link
Collaborator Author

Understood, so currently we should focus on identifying the cause and specific examples of the bug, rather than making these changes.

Also, is VirtualDB specifically responsible for basic functionalities? Is it necessary to encapsulate functions like retrieving DTO data?

@cmatKhan
Copy link
Member

cmatKhan commented Feb 4, 2026

yes, and I think one of the problems is the way the comparative analysis dataset is configured. I'm playing with moving it out to the same level as the other repos, and adding a "links to" field which lists other configured datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants