{
"cells": [
{
"cell_type": "markdown",
"id": "edcb3b82",
"metadata": {},
"source": [
"## Analyse Binette results\n",
"\n",
"Let's visualize the results from Binette and compare them to the initial bin sets used as input. \n",
"\n",
"To explore these results interactively, you can open the Jupyter notebook via Binder by following this link: [](https://mybinder.org/v2/gh/genotoul-bioinfo/Binette/binder_tutorial_env?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Fgenotoul-bioinfo%252FBinette%26urlpath%3Dtree%252FBinette%252Fdocs%252Ftutorial%252Fanalyse_binette_result.ipynb%26branch%3Dmain)"
]
},
{
"cell_type": "markdown",
"id": "dbe1d73b",
"metadata": {},
"source": [
"### Import Necessary Libraries\n",
"\n",
"First, we'll need to import the necessary libraries for our analysis and plotting:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9e9153ef",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:20.824074Z",
"iopub.status.busy": "2025-10-14T08:42:20.823897Z",
"iopub.status.idle": "2025-10-14T08:42:22.884995Z",
"shell.execute_reply": "2025-10-14T08:42:22.884454Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from pathlib import Path\n",
"import plotly.express as px\n",
"\n",
"# The following two lines are needed to properly display Plotly graphs in the documentation\n",
"# However you may need to remove these lines and restart the kernel to visualise the graph in another context\n",
"import plotly.io as pio\n",
"pio.renderers.default = \"sphinx_gallery\""
]
},
{
"cell_type": "markdown",
"id": "b93e8a0e",
"metadata": {},
"source": [
"### Load Binette Results\n",
"\n",
"Now, let's load the final Binette quality report into a Pandas DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d95ad45c",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:22.887040Z",
"iopub.status.busy": "2025-10-14T08:42:22.886826Z",
"iopub.status.idle": "2025-10-14T08:42:22.922631Z",
"shell.execute_reply": "2025-10-14T08:42:22.922125Z"
},
"lines_to_next_cell": 0
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" origin | \n",
" is_original | \n",
" original_name | \n",
" completeness | \n",
" contamination | \n",
" score | \n",
" checkm2_model | \n",
" size | \n",
" N50 | \n",
" coding_density | \n",
" contig_count | \n",
" tool | \n",
" index | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" binette_bin1 | \n",
" binette | \n",
" False | \n",
" binette_bin1 | \n",
" 100.00 | \n",
" 0.10 | \n",
" 99.80 | \n",
" Neural Network (Specific Model) | \n",
" 4658605 | \n",
" 82084 | \n",
" 0.8803 | \n",
" 91 | \n",
" binette | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" binette_bin2 | \n",
" binette | \n",
" False | \n",
" binette_bin2 | \n",
" 99.94 | \n",
" 0.23 | \n",
" 99.48 | \n",
" Neural Network (Specific Model) | \n",
" 2796059 | \n",
" 41151 | \n",
" 0.8882 | \n",
" 98 | \n",
" binette | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" binette_bin3 | \n",
" binette | \n",
" False | \n",
" binette_bin3 | \n",
" 96.10 | \n",
" 0.27 | \n",
" 95.56 | \n",
" Gradient Boost (General Model) | \n",
" 2559714 | \n",
" 11656 | \n",
" 0.8990 | \n",
" 315 | \n",
" binette | \n",
" 2 | \n",
"
\n",
" \n",
" | 3 | \n",
" binette_bin4 | \n",
" binette | \n",
" False | \n",
" binette_bin4 | \n",
" 93.43 | \n",
" 0.12 | \n",
" 93.19 | \n",
" Neural Network (Specific Model) | \n",
" 4229623 | \n",
" 40395 | \n",
" 0.9031 | \n",
" 148 | \n",
" binette | \n",
" 3 | \n",
"
\n",
" \n",
" | 4 | \n",
" binette_bin5 | \n",
" binette | \n",
" False | \n",
" binette_bin5 | \n",
" 95.15 | \n",
" 2.36 | \n",
" 90.43 | \n",
" Gradient Boost (General Model) | \n",
" 1843697 | \n",
" 10106 | \n",
" 0.8835 | \n",
" 266 | \n",
" binette | \n",
" 4 | \n",
"
\n",
" \n",
" | 5 | \n",
" binette_bin6 | \n",
" binette | \n",
" False | \n",
" binette_bin6 | \n",
" 91.50 | \n",
" 2.21 | \n",
" 87.08 | \n",
" Gradient Boost (General Model) | \n",
" 3543663 | \n",
" 5964 | \n",
" 0.8542 | \n",
" 786 | \n",
" binette | \n",
" 5 | \n",
"
\n",
" \n",
" | 6 | \n",
" binette_bin7 | \n",
" semibin2_output/output_bins | \n",
" True | \n",
" SemiBin_23.fa | \n",
" 84.06 | \n",
" 1.66 | \n",
" 80.74 | \n",
" Neural Network (Specific Model) | \n",
" 1689331 | \n",
" 8389 | \n",
" 0.8678 | \n",
" 246 | \n",
" binette | \n",
" 6 | \n",
"
\n",
" \n",
" | 7 | \n",
" binette_bin8 | \n",
" binette | \n",
" False | \n",
" binette_bin8 | \n",
" 74.32 | \n",
" 2.17 | \n",
" 69.98 | \n",
" Gradient Boost (General Model) | \n",
" 1257085 | \n",
" 5017 | \n",
" 0.8946 | \n",
" 257 | \n",
" binette | \n",
" 7 | \n",
"
\n",
" \n",
" | 8 | \n",
" binette_bin9 | \n",
" binette | \n",
" False | \n",
" binette_bin9 | \n",
" 74.08 | \n",
" 3.82 | \n",
" 66.44 | \n",
" Neural Network (Specific Model) | \n",
" 3492747 | \n",
" 3005 | \n",
" 0.9218 | \n",
" 1308 | \n",
" binette | \n",
" 8 | \n",
"
\n",
" \n",
" | 9 | \n",
" binette_bin10 | \n",
" binette | \n",
" False | \n",
" binette_bin10 | \n",
" 64.49 | \n",
" 1.79 | \n",
" 60.91 | \n",
" Gradient Boost (General Model) | \n",
" 1266713 | \n",
" 3796 | \n",
" 0.9064 | \n",
" 415 | \n",
" binette | \n",
" 9 | \n",
"
\n",
" \n",
" | 10 | \n",
" binette_bin11 | \n",
" binette | \n",
" False | \n",
" binette_bin11 | \n",
" 60.27 | \n",
" 1.85 | \n",
" 56.57 | \n",
" Neural Network (Specific Model) | \n",
" 2080860 | \n",
" 4612 | \n",
" 0.9044 | \n",
" 519 | \n",
" binette | \n",
" 10 | \n",
"
\n",
" \n",
" | 11 | \n",
" binette_bin12 | \n",
" binette | \n",
" False | \n",
" binette_bin12 | \n",
" 52.00 | \n",
" 1.07 | \n",
" 49.86 | \n",
" Neural Network (Specific Model) | \n",
" 2516999 | \n",
" 5503 | \n",
" 0.9092 | \n",
" 482 | \n",
" binette | \n",
" 11 | \n",
"
\n",
" \n",
" | 12 | \n",
" binette_bin13 | \n",
" binette | \n",
" False | \n",
" binette_bin13 | \n",
" 48.86 | \n",
" 4.50 | \n",
" 39.86 | \n",
" Gradient Boost (General Model) | \n",
" 1119471 | \n",
" 1517 | \n",
" 0.8945 | \n",
" 729 | \n",
" binette | \n",
" 12 | \n",
"
\n",
" \n",
" | 13 | \n",
" binette_bin14 | \n",
" binette | \n",
" False | \n",
" binette_bin14 | \n",
" 43.66 | \n",
" 5.11 | \n",
" 33.44 | \n",
" Neural Network (Specific Model) | \n",
" 2087483 | \n",
" 4593 | \n",
" 0.9248 | \n",
" 476 | \n",
" binette | \n",
" 13 | \n",
"
\n",
" \n",
" | 14 | \n",
" binette_bin15 | \n",
" binette | \n",
" False | \n",
" binette_bin15 | \n",
" 43.93 | \n",
" 9.52 | \n",
" 24.89 | \n",
" Neural Network (Specific Model) | \n",
" 2451217 | \n",
" 1480 | \n",
" 0.8544 | \n",
" 1627 | \n",
" binette | \n",
" 14 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name origin is_original original_name \\\n",
"0 binette_bin1 binette False binette_bin1 \n",
"1 binette_bin2 binette False binette_bin2 \n",
"2 binette_bin3 binette False binette_bin3 \n",
"3 binette_bin4 binette False binette_bin4 \n",
"4 binette_bin5 binette False binette_bin5 \n",
"5 binette_bin6 binette False binette_bin6 \n",
"6 binette_bin7 semibin2_output/output_bins True SemiBin_23.fa \n",
"7 binette_bin8 binette False binette_bin8 \n",
"8 binette_bin9 binette False binette_bin9 \n",
"9 binette_bin10 binette False binette_bin10 \n",
"10 binette_bin11 binette False binette_bin11 \n",
"11 binette_bin12 binette False binette_bin12 \n",
"12 binette_bin13 binette False binette_bin13 \n",
"13 binette_bin14 binette False binette_bin14 \n",
"14 binette_bin15 binette False binette_bin15 \n",
"\n",
" completeness contamination score checkm2_model \\\n",
"0 100.00 0.10 99.80 Neural Network (Specific Model) \n",
"1 99.94 0.23 99.48 Neural Network (Specific Model) \n",
"2 96.10 0.27 95.56 Gradient Boost (General Model) \n",
"3 93.43 0.12 93.19 Neural Network (Specific Model) \n",
"4 95.15 2.36 90.43 Gradient Boost (General Model) \n",
"5 91.50 2.21 87.08 Gradient Boost (General Model) \n",
"6 84.06 1.66 80.74 Neural Network (Specific Model) \n",
"7 74.32 2.17 69.98 Gradient Boost (General Model) \n",
"8 74.08 3.82 66.44 Neural Network (Specific Model) \n",
"9 64.49 1.79 60.91 Gradient Boost (General Model) \n",
"10 60.27 1.85 56.57 Neural Network (Specific Model) \n",
"11 52.00 1.07 49.86 Neural Network (Specific Model) \n",
"12 48.86 4.50 39.86 Gradient Boost (General Model) \n",
"13 43.66 5.11 33.44 Neural Network (Specific Model) \n",
"14 43.93 9.52 24.89 Neural Network (Specific Model) \n",
"\n",
" size N50 coding_density contig_count tool index \n",
"0 4658605 82084 0.8803 91 binette 0 \n",
"1 2796059 41151 0.8882 98 binette 1 \n",
"2 2559714 11656 0.8990 315 binette 2 \n",
"3 4229623 40395 0.9031 148 binette 3 \n",
"4 1843697 10106 0.8835 266 binette 4 \n",
"5 3543663 5964 0.8542 786 binette 5 \n",
"6 1689331 8389 0.8678 246 binette 6 \n",
"7 1257085 5017 0.8946 257 binette 7 \n",
"8 3492747 3005 0.9218 1308 binette 8 \n",
"9 1266713 3796 0.9064 415 binette 9 \n",
"10 2080860 4612 0.9044 519 binette 10 \n",
"11 2516999 5503 0.9092 482 binette 11 \n",
"12 1119471 1517 0.8945 729 binette 12 \n",
"13 2087483 4593 0.9248 476 binette 13 \n",
"14 2451217 1480 0.8544 1627 binette 14 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"binette_result_file = \"./binette_output/final_bins_quality_reports.tsv\"\n",
"df_binette = pd.read_csv(binette_result_file, sep='\\t')\n",
"df_binette['tool'] = \"binette\" # Add a column to label the tool\n",
"df_binette['index'] = df_binette.index # Add an index column\n",
"df_binette"
]
},
{
"cell_type": "markdown",
"id": "c1372a73",
"metadata": {},
"source": [
"### Load and Combine Input Bin Quality Reports\n",
"\n",
"Next, we will load the quality reports of the input bin sets, computed by various tools and saved by Binette. We’ll combine these into a single DataFrame and add a column to indicate high-quality bins. We define a high-quality bin as one with contamination ≤ 5% and completeness ≥ 90%."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fcb016f2",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:22.924325Z",
"iopub.status.busy": "2025-10-14T08:42:22.924161Z",
"iopub.status.idle": "2025-10-14T08:42:22.954467Z",
"shell.execute_reply": "2025-10-14T08:42:22.954109Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" tool | \n",
" completeness | \n",
" contamination | \n",
" size | \n",
" N50 | \n",
" contig_count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" binette | \n",
" 100.00 | \n",
" 0.10 | \n",
" 4658605 | \n",
" 82084 | \n",
" 91 | \n",
"
\n",
" \n",
" | 1 | \n",
" binette | \n",
" 99.94 | \n",
" 0.23 | \n",
" 2796059 | \n",
" 41151 | \n",
" 98 | \n",
"
\n",
" \n",
" | 2 | \n",
" binette | \n",
" 96.10 | \n",
" 0.27 | \n",
" 2559714 | \n",
" 11656 | \n",
" 315 | \n",
"
\n",
" \n",
" | 3 | \n",
" binette | \n",
" 93.43 | \n",
" 0.12 | \n",
" 4229623 | \n",
" 40395 | \n",
" 148 | \n",
"
\n",
" \n",
" | 4 | \n",
" binette | \n",
" 95.15 | \n",
" 2.36 | \n",
" 1843697 | \n",
" 10106 | \n",
" 266 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 9 | \n",
" metabat2 | \n",
" 44.85 | \n",
" 0.79 | \n",
" 987990 | \n",
" 4743 | \n",
" 220 | \n",
"
\n",
" \n",
" | 10 | \n",
" metabat2 | \n",
" 44.38 | \n",
" 0.58 | \n",
" 1745116 | \n",
" 4265 | \n",
" 420 | \n",
"
\n",
" \n",
" | 11 | \n",
" metabat2 | \n",
" 25.47 | \n",
" 0.03 | \n",
" 1077467 | \n",
" 91995 | \n",
" 14 | \n",
"
\n",
" \n",
" | 12 | \n",
" metabat2 | \n",
" 94.21 | \n",
" 37.06 | \n",
" 8631886 | \n",
" 4347 | \n",
" 1994 | \n",
"
\n",
" \n",
" | 13 | \n",
" metabat2 | \n",
" 7.06 | \n",
" 0.03 | \n",
" 252404 | \n",
" 64012 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
139 rows × 6 columns
\n",
"
"
],
"text/plain": [
" tool completeness contamination size N50 contig_count\n",
"0 binette 100.00 0.10 4658605 82084 91\n",
"1 binette 99.94 0.23 2796059 41151 98\n",
"2 binette 96.10 0.27 2559714 11656 315\n",
"3 binette 93.43 0.12 4229623 40395 148\n",
"4 binette 95.15 2.36 1843697 10106 266\n",
".. ... ... ... ... ... ...\n",
"9 metabat2 44.85 0.79 987990 4743 220\n",
"10 metabat2 44.38 0.58 1745116 4265 420\n",
"11 metabat2 25.47 0.03 1077467 91995 14\n",
"12 metabat2 94.21 37.06 8631886 4347 1994\n",
"13 metabat2 7.06 0.03 252404 64012 6\n",
"\n",
"[139 rows x 6 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pathlib import Path\n",
"\n",
"input_bins_quality_reports_dir = Path(\"binette_output/input_bins_quality_reports/\")\n",
"\n",
"# Initialize the list with Binette results\n",
"df_input_bin_list = [df_binette]\n",
"\n",
"# Load each input bin quality report\n",
"for input_bin_metric_file in input_bins_quality_reports_dir.glob(\"*tsv\"):\n",
" tool = input_bin_metric_file.name.split('.')[1].split('_')[0] # Extract tool name from file name\n",
" df_input = pd.read_csv(input_bin_metric_file, sep='\\t')\n",
" df_input['index'] = df_input.index\n",
" df_input['tool'] = tool\n",
" df_input_bin_list.append(df_input)\n",
"\n",
"# Combine all DataFrames into one\n",
"df_bins = pd.concat(df_input_bin_list)\n",
"\n",
"# Add a column to indicate high-quality bins\n",
"df_bins[\"High quality bin\"] = (df_bins['completeness'] >= 90) & (df_bins['contamination'] <= 5)\n",
"\n",
"# Display relevant columns\n",
"df_bins[[ \"tool\", \"completeness\", \"contamination\", \"size\", \"N50\", \"contig_count\"]]\n"
]
},
{
"cell_type": "markdown",
"id": "80ef2544",
"metadata": {},
"source": [
"### Plot bin completeness and contamination\n",
"With the DataFrame containing both Binette’s final bins and the input bins, we can now create a scatter plot to visualize the results:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "277cb781",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:22.956317Z",
"iopub.status.busy": "2025-10-14T08:42:22.956154Z",
"iopub.status.idle": "2025-10-14T08:42:23.439658Z",
"shell.execute_reply": "2025-10-14T08:42:23.439082Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import plotly.express as px\n",
"\n",
"# Create a scatter plot to visualize completeness and contamination\n",
"fig = px.scatter(df_bins, \n",
" x=\"completeness\", \n",
" y=\"contamination\", \n",
" color=\"High quality bin\", \n",
" size=\"size\", \n",
" facet_row=\"tool\",\n",
" title=\"Bin Quality Comparison\",\n",
" )\n",
"\n",
"# Update layout for better visibility\n",
"fig.update_layout(\n",
" width=600,\n",
" height=800,\n",
" legend_title=\"High Quality Bin\",\n",
" title=\"Comparison of Bin Quality Metrics\"\n",
")\n",
"\n",
"# Show the plot\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "06a14412",
"metadata": {},
"source": [
"We can see that binette bins are the one displaying the most high quality bins (completeness ≥ 90% and contamination ≤ 5%).\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "323f5637",
"metadata": {},
"source": [
"### Comparing Binning Tools Using Bin Score Curves\n",
"\n",
"A common way to compare bin sets is by sorting the bins based on their scores and plotting them against their index.\n",
"\n",
"Here’s how we can create such a plot:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "79faaa3a",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:23.441369Z",
"iopub.status.busy": "2025-10-14T08:42:23.441165Z",
"iopub.status.idle": "2025-10-14T08:42:23.482193Z",
"shell.execute_reply": "2025-10-14T08:42:23.481769Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Calculate the score for each bin\n",
"df_bins['completeness - 2*contamination'] = df_bins['completeness'] - 2 * df_bins['contamination']\n",
"\n",
"# Plot the score against the bin index\n",
"fig = px.line(df_bins, x=\"index\", y='completeness - 2*contamination', color=\"tool\", markers=True)\n",
"fig.update_layout(width=600, height=500)\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "97aee4d0",
"metadata": {},
"source": [
"From the plot, you might notice that Concoct has a lot of bins with lower quality scores. Let’s zoom in to get a better look:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "063974f6",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:23.483831Z",
"iopub.status.busy": "2025-10-14T08:42:23.483667Z",
"iopub.status.idle": "2025-10-14T08:42:23.490478Z",
"shell.execute_reply": "2025-10-14T08:42:23.490114Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Adjust the plot view to zoom in\n",
"fig.update_layout(\n",
" xaxis_range=[-1, 20], # Zoom on x-axis\n",
" yaxis_range=[0, 100], # Zoom on y-axis\n",
" width=600,\n",
" height=500\n",
")\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "136b17e4",
"metadata": {},
"source": [
"Binette line consistently appears above the other binning tools. This indicates that Binette produce higher-quality bins compared to the initial bin sets."
]
},
{
"cell_type": "markdown",
"id": "46f1b3d0",
"metadata": {},
"source": [
"### Plot Number of High-Quality Bins per Bin Set\n",
"\n",
"Let's plot the number of bins falling into different quality categories. We’ll focus on bins with a maximum of 10% contamination and classify them into three completeness categories:\n",
"\n",
"- **`> 50% and ≤ 70%`**\n",
"- **`> 70% and ≤ 90%`**\n",
"- **`> 90%`**\n",
"\n",
"First, let’s group and count the bins in each category:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "943f88b4",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:23.492146Z",
"iopub.status.busy": "2025-10-14T08:42:23.492003Z",
"iopub.status.idle": "2025-10-14T08:42:23.506310Z",
"shell.execute_reply": "2025-10-14T08:42:23.505928Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Contamination ≤ 10 and<br>Completeness | \n",
" tool | \n",
" bin_count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" > 50% and ≤ 70% | \n",
" binette | \n",
" 3 | \n",
"
\n",
" \n",
" | 1 | \n",
" > 50% and ≤ 70% | \n",
" maxbin2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 2 | \n",
" > 50% and ≤ 70% | \n",
" metabat2 | \n",
" 1 | \n",
"
\n",
" \n",
" | 3 | \n",
" > 50% and ≤ 70% | \n",
" semibin2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 4 | \n",
" > 70% and ≤ 90% | \n",
" binette | \n",
" 3 | \n",
"
\n",
" \n",
" | 5 | \n",
" > 70% and ≤ 90% | \n",
" concoct | \n",
" 2 | \n",
"
\n",
" \n",
" | 6 | \n",
" > 70% and ≤ 90% | \n",
" metabat2 | \n",
" 5 | \n",
"
\n",
" \n",
" | 7 | \n",
" > 70% and ≤ 90% | \n",
" semibin2 | \n",
" 4 | \n",
"
\n",
" \n",
" | 8 | \n",
" > 90% | \n",
" binette | \n",
" 6 | \n",
"
\n",
" \n",
" | 9 | \n",
" > 90% | \n",
" concoct | \n",
" 4 | \n",
"
\n",
" \n",
" | 10 | \n",
" > 90% | \n",
" maxbin2 | \n",
" 2 | \n",
"
\n",
" \n",
" | 11 | \n",
" > 90% | \n",
" metabat2 | \n",
" 3 | \n",
"
\n",
" \n",
" | 12 | \n",
" > 90% | \n",
" semibin2 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Contamination ≤ 10 and
Completeness tool bin_count\n",
"0 > 50% and ≤ 70% binette 3\n",
"1 > 50% and ≤ 70% maxbin2 2\n",
"2 > 50% and ≤ 70% metabat2 1\n",
"3 > 50% and ≤ 70% semibin2 2\n",
"4 > 70% and ≤ 90% binette 3\n",
"5 > 70% and ≤ 90% concoct 2\n",
"6 > 70% and ≤ 90% metabat2 5\n",
"7 > 70% and ≤ 90% semibin2 4\n",
"8 > 90% binette 6\n",
"9 > 90% concoct 4\n",
"10 > 90% maxbin2 2\n",
"11 > 90% metabat2 3\n",
"12 > 90% semibin2 4"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define the contamination cutoff\n",
"contamination_cutoff = 10\n",
"\n",
"# Create filters for completeness categories\n",
"low_contamination_filt = df_bins['contamination'] <= contamination_cutoff\n",
"high_completeness_filt = df_bins['completeness'] > 90\n",
"medium_completeness_filt = df_bins['completeness'] > 70\n",
"low_completeness_filt = df_bins['completeness'] > 50\n",
"\n",
"# Define quality categories\n",
"quality = f'Contamination ≤ {contamination_cutoff} and
Completeness'\n",
"df_bins.loc[low_contamination_filt & low_completeness_filt, quality] = '> 50% and ≤ 70%'\n",
"df_bins.loc[low_contamination_filt & medium_completeness_filt, quality] = '> 70% and ≤ 90%'\n",
"df_bins.loc[low_contamination_filt & high_completeness_filt, quality] = '> 90%'\n",
"\n",
"# Group and count bins by quality category and tool\n",
"df_bins_quality_grouped = df_bins.groupby([quality, 'tool']).agg(bin_count=('index', 'count')).reset_index()\n",
"df_bins_quality_grouped"
]
},
{
"cell_type": "markdown",
"id": "6eec391a",
"metadata": {},
"source": [
"Now, let’s create a bar plot to visualize the number of bins in each quality category for each bin sets:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "36ce51ac",
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-14T08:42:23.507763Z",
"iopub.status.busy": "2025-10-14T08:42:23.507583Z",
"iopub.status.idle": "2025-10-14T08:42:23.620352Z",
"shell.execute_reply": "2025-10-14T08:42:23.619829Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Define colors for each completeness category\n",
"color_discrete_map = {\n",
" \"> 90%\": px.colors.qualitative.Prism[4],\n",
" \"> 70% and ≤ 90%\": px.colors.qualitative.Prism[2],\n",
" \"> 50% and ≤ 70%\": px.colors.qualitative.Prism[6]\n",
"}\n",
"\n",
"# Create the bar plot\n",
"fig = px.bar(\n",
" df_bins_quality_grouped, \n",
" x='tool', \n",
" y=\"bin_count\", \n",
" color=quality,\n",
" barmode='stack', \n",
" color_discrete_map=color_discrete_map, \n",
" text=\"bin_count\",\n",
" category_orders={\"tool\": [\"binette\", \"semibin2\", \"concoct\", \"metabat2\", \"maxbin2\"]},\n",
" opacity=0.9\n",
")\n",
"\n",
"# Update layout for better appearance\n",
"fig.update_layout(\n",
" width=600,\n",
" height=500,\n",
" legend=dict(traceorder=\"reversed\")\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "f78d0f29",
"metadata": {},
"source": [
"From the plot, you can see that Binette produces more high-quality bins compared to the initial bin sets! 🎉"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}