.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/figure_1_ijk.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_figure_1_ijk.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_figure_1_ijk.py:


Figure 1 IJK
============
recreate figure 1IJK from the paper

.. GENERATED FROM PYTHON SOURCE LINES 8-11

This notebook performs a statistical analysis on the learning rates between two groups of mice.
Performs model fitting of individual mice, comparison of parameter, and statistical analysis
between the groups to conclude differences and points of divergence in learning

.. GENERATED FROM PYTHON SOURCE LINES 11-19

.. code-block:: default


    # Commented out IPython magic to ensure Python compatibility.
    # run this on Colab
    # !rm -rf APE_paper/
    # !git clone https://github.com/HernandoMV/APE_paper.git
    # %pip install mouse-behavior-analysis-tools
    # %cd APE_paper/docs/figures_notebooks


.. GENERATED FROM PYTHON SOURCE LINES 20-22

1. Import libraries
~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 22-42

.. code-block:: default


    # Commented out IPython magic to ensure Python compatibility.
    # %load_ext autoreload
    # %autoreload 2
    import matplotlib.pylab as plt
    import numpy as np
    import seaborn as sns
    import pandas as pd
    from scipy import stats
    from IPython.display import clear_output
    from itertools import chain
    from os.path import exists
    import urllib.request

    from mouse_behavior_analysis_tools.utils import custom_functions as cuf
    from mouse_behavior_analysis_tools.plot import make_figures
    from mouse_behavior_analysis_tools.model import behavior_model as bm

    clear_output()


.. GENERATED FROM PYTHON SOURCE LINES 43-46

This dataset has been pre-processed, but conditions have not been selected
This preprocessing includes removal of disengaged trials and
removal of the first 5 trials of each session

.. GENERATED FROM PYTHON SOURCE LINES 46-61

.. code-block:: default


    # 2. Download data
    # ~~~~~~~~~~~~~~~~~
    dataset_name = 'Chronic-lesions_dataframe.csv'
    url = "https://zenodo.org/record/7261639/files/" + dataset_name
    dataset_path = '../data/' + dataset_name
    # download if data is not there
    if not exists(dataset_path):
        print('Downloading data...')
        urllib.request.urlretrieve(url, dataset_path)
    else:
        print('Data already in directory')
    # load
    df_to_plot = pd.read_csv(dataset_path, index_col=0)


.. GENERATED FROM PYTHON SOURCE LINES 62-63

parameters for the plotting

.. GENERATED FROM PYTHON SOURCE LINES 63-68

.. code-block:: default

    hue_order = ['Control', 'Lesion']
    color_palette = [(0.24715576, 0.49918708, 0.57655991),
                     (160/255, 11/255 , 11/255)]
    sns.set_palette(color_palette)


.. GENERATED FROM PYTHON SOURCE LINES 69-70

maximum number of trials performed per mouse in the dataset:

.. GENERATED FROM PYTHON SOURCE LINES 70-72

.. code-block:: default

    df_to_plot.groupby(['AnimalID', 'ExperimentalGroup', 'Protocol']).max()['CumulativeTrialNumberByProtocol']


.. GENERATED FROM PYTHON SOURCE LINES 73-76

2. Plot performance
~~~~~~~~~~~~~~~~~~~~~
bin trials every 200

.. GENERATED FROM PYTHON SOURCE LINES 76-78

.. code-block:: default

    df_to_plot["TrialIndexBinned200"] = (df_to_plot.CumulativeTrialNumberByProtocol // 200) * 200 + 100


.. GENERATED FROM PYTHON SOURCE LINES 79-80

Sanity check on the data to see that it looks good

.. GENERATED FROM PYTHON SOURCE LINES 80-92

.. code-block:: default

    fig = make_figures.make_figure_performance_trials_animals_bin(df_to_plot)
    clear_output()
    plt.show(fig)
    # uncomment here to save the plot
    # data_directory = ''
    # plt.savefig(data_directory + 'Performance_by_session_individual_animals.pdf',
    #             transparent=True, bbox_inches='tight')

    """Colored dots show the performance of the past 100 trials using an average running window.
    Each color represents a distinct session. The black line shows the performance value of
    the past 200 trials using trial bining.
    """

.. GENERATED FROM PYTHON SOURCE LINES 93-96

3. Model fitting
 ~~~~~~~~~~~~~~~~~~~~~
### Fit a sigmoid to every mouse to calculate and compare stages and learning rates

.. GENERATED FROM PYTHON SOURCE LINES 96-114

.. code-block:: default

    """
    The sigmoid function is (perf_end - 0.5) / (1 + np.exp(-slope * (x - bias))) + 0.5

    Data is scaled before the fitting and parameters are rescaled afterwards

    The maximum performace possible is defined as the maximum of the median of the trials binned every 200
    """

    #calculate the maximum performance for every mouse based on the trials binned every 200
    df_bin200tr = df_to_plot.groupby(['AnimalID','ExperimentalGroup','TrialIndexBinned200','Protocol']).median().reset_index()
    mouse_max_perf = df_bin200tr.groupby('AnimalID').max().reset_index()[['AnimalID', 'CurrentPastPerformance100']]

    # fit the model
    fit_df = bm.get_df_of_behavior_model_by_animal(df_to_plot, mouse_max_perf)

    # find the steepest point of each slope (max of derivative)
    der_max_dir = bm.get_steepest_point_of_slope(fit_df)


.. GENERATED FROM PYTHON SOURCE LINES 115-117

plot the curves again pointing to the maximum
sanity check to see that these scaled values recreate the curves

.. GENERATED FROM PYTHON SOURCE LINES 117-122

.. code-block:: default

    fig = make_figures.make_figure_performance_trials_animals_model(df_to_plot, fit_df, der_max_dir)
    clear_output()
    plt.show(fig)
    # plt.savefig(data_directory + 'Sigmoid_fitting.pdf', transparent=True, bbox_inches='tight')


.. GENERATED FROM PYTHON SOURCE LINES 123-131

.. code-block:: default

    """As a learning speed measure I use the point of the maximum learning rate, which is the maximum of the derivative of the fitting, which is also the middle point of the curve (in between 50% and maximum performance)
    The bias indicates where this point lies in the x axis: at which trial number this point is reached
    """

    # add the maximum value of the derivative to the fit_df dataframe
    for key, value in der_max_dir.items():
        fit_df.loc[fit_df.index[fit_df['AnimalID'] == key].tolist()[0], 'max_of_der'] = value[1]


.. GENERATED FROM PYTHON SOURCE LINES 132-134

4. Statistics
~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 134-147

.. code-block:: default

    """### Differences of parameters between groups"""

    # calculate significance
    parameters_to_show = ['maximum_performance', 'max_of_der']
    # define three levels of significance for the plotting
    sig_levels = [0.05, 0.01, 0.001]
    pvals = []
    print('Kluskal-Wallis tests on the parameters')
    for var in parameters_to_show:
        kt = stats.kruskal(fit_df[fit_df.ExperimentalGroup==hue_order[0]][var].dropna(),
                           fit_df[fit_df.ExperimentalGroup==hue_order[1]][var].dropna())
        print( var + ':\t\tpvalue: ' + str(kt.pvalue) )
        pvals.append(kt.pvalue)

.. GENERATED FROM PYTHON SOURCE LINES 148-150

5. Plot figure
~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 150-168

.. code-block:: default

    """##### The following cell creates **Figures 1 J, K** from the paper"""

    # compare the parameters between groups
    titles = ['maximum\nperformance', 'maximum\nlearning rate']
    ylabs = ['task performance (%)', '\u0394performance \u00D7 trials$\mathregular{^{-1}}$']
    yvals = [70, 0.001] # for the statistics

    fig = make_figures.make_figure_learning_parameters_between_groups(fit_df, parameters_to_show,
                                                                      titles, ylabs,
                                                                      pvals, sig_levels,
                                                                      color_palette, hue_order, yvals)

    # plt.savefig(data_directory + 'Parameters_group_comparison.pdf', transparent=True, bbox_inches='tight')
    plt.show(fig)

    for i,s in enumerate(sig_levels):
        print(i+1, 'asterisks: pval <', s)


.. GENERATED FROM PYTHON SOURCE LINES 169-171

6. Learning curves
~~~~~~~~~~~~~~~~~~~~~

.. GENERATED FROM PYTHON SOURCE LINES 171-181

.. code-block:: default

    """### calculate the statistical differences of performances between groups at different points in learning
    ##### Do not consider trials in which the animal is too biased to calculate performance, as the antibias acts making the mice be worse than chance and that can obscure effects of learning
    """

    # define the bias threshold as over 2 std of the total dataset
    bias_top_threshold = np.nanmean(df_to_plot.RightBias) + 2 * np.nanstd(df_to_plot.RightBias)
    bias_bottom_threshold = np.nanmean(df_to_plot.RightBias) - 2 * np.nanstd(df_to_plot.RightBias)
    # create a mask
    bias_mask = np.logical_and(df_to_plot.RightBias < bias_top_threshold, df_to_plot.RightBias > bias_bottom_threshold)


.. GENERATED FROM PYTHON SOURCE LINES 182-183

Sanity check on the data to see that it looks good

.. GENERATED FROM PYTHON SOURCE LINES 183-188

.. code-block:: default

    fig = make_figures.make_figure_performance_trials_animals_biased_trials(df_to_plot, bias_mask)
    clear_output()
    plt.show(fig)
    # plt.savefig(data_directory + 'Sigmoid_fitting.pdf', transparent=True, bbox_inches='tight')


.. GENERATED FROM PYTHON SOURCE LINES 189-199

.. code-block:: default

    """#### Recalculate performance"""

    PAST_WINDOW = 100
    CumPerList = []
    for Sid in pd.unique(df_to_plot['SessionID']):
        CumPerList.append(cuf.perf_window_calculator(df_to_plot[np.logical_and(bias_mask,
                                                                               df_to_plot['SessionID']==Sid)],
                                                     PAST_WINDOW))
    # flatten the list of lists
    df_to_plot.loc[bias_mask, 'CurrentPastPerformance100biasremoved'] = np.array(list(chain(*[x for x in CumPerList])))

.. GENERATED FROM PYTHON SOURCE LINES 200-212

.. code-block:: default

    """##### The following cell creates the first part of **Figure 1 I** from the paper"""

    # Performance differences between groups across learning
    col_to_plot = 'CurrentPastPerformance100biasremoved'
    fig1 = make_figures.make_figure_differences_performance_between_groups(df_to_plot, col_to_plot,
                                                                           hue_order, color_palette)

    # plt.savefig(data_directory + 'Performance_between_groups.pdf', transparent=True, bbox_inches='tight')
    fig1.show()
    print('Shaded area indicates std, and performance is calculated using', col_to_plot)


.. GENERATED FROM PYTHON SOURCE LINES 213-216

7. Separate performance on high and low tones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
create a mask for the TrialHighPerc columns

.. GENERATED FROM PYTHON SOURCE LINES 216-233

.. code-block:: default

    high_mask = df_to_plot['TrialHighPerc'] == 98.0
    low_mask = df_to_plot['TrialHighPerc'] == 2.0

    # calculate the performance taking on account only these trials
    CumPerListHigh = []
    CumPerListLow = []
    for Sid in pd.unique(df_to_plot['SessionID']):
        CumPerListHigh.append(cuf.perf_window_calculator(df_to_plot[np.logical_and(high_mask,
                                                                                   df_to_plot['SessionID']==Sid)],
                                                         PAST_WINDOW))
        CumPerListLow.append(cuf.perf_window_calculator(df_to_plot[np.logical_and(low_mask,
                                                                                  df_to_plot['SessionID']==Sid)],
                                                        PAST_WINDOW))
    # flatten the list of lists
    df_to_plot.loc[high_mask, 'CurrentPastPerformance100biasremovedHigh'] = np.array(list(chain(*[x for x in CumPerListHigh])))
    df_to_plot.loc[low_mask, 'CurrentPastPerformance100biasremovedLow'] = np.array(list(chain(*[x for x in CumPerListLow])))


.. GENERATED FROM PYTHON SOURCE LINES 234-236

.. code-block:: default

    df_to_plot[['CurrentPastPerformance100biasremovedHigh', 'CurrentPastPerformance100biasremovedLow']].tail(100)


.. GENERATED FROM PYTHON SOURCE LINES 237-238

Plot performance on high and low tones

.. GENERATED FROM PYTHON SOURCE LINES 238-250

.. code-block:: default

    col_to_plot = 'CurrentPastPerformance100biasremovedLow'
    figX = make_figures.make_figure_differences_performance_between_groups(df_to_plot, col_to_plot,
                                                                           hue_order, color_palette)

    # plt.savefig(data_directory + 'Performance_between_groups.pdf', transparent=True, bbox_inches='tight')
    figX.show()

    """
    This approach doesn't seem to work well. It might be because I am using both trials to calculate performance.
    Let's eliminate those trials from the dataset
    """


.. GENERATED FROM PYTHON SOURCE LINES 251-290

.. code-block:: default

    df_high = df_to_plot[high_mask].copy()
    df_low = df_to_plot[low_mask].copy()

    # reset the index and the cumulative trial number by protocol
    df_high.reset_index(inplace=True)
    df_low.reset_index(inplace=True)
    df_high.drop('index', axis=1, inplace=True)
    df_low.drop('index', axis=1, inplace=True)

    # Restart the count of CumulativeTrialNumber for every protocol
    df_high['CumulativeTrialNumberByProtocol'] = np.nan
    df_low['CumulativeTrialNumberByProtocol'] = np.nan

    for Aid in pd.unique(df_high['AnimalID']):
        for Prot in pd.unique(df_high['Protocol']):
            conditions = np.logical_and(df_high['AnimalID'] == Aid, df_high['Protocol'] == Prot)
            df_high.CumulativeTrialNumberByProtocol.loc[df_high[conditions].index] = \
                np.arange(len(df_high[conditions])) + 1

    for Aid in pd.unique(df_low['AnimalID']):
        for Prot in pd.unique(df_low['Protocol']):
            conditions = np.logical_and(df_low['AnimalID'] == Aid, df_low['Protocol'] == Prot)
            df_low.CumulativeTrialNumberByProtocol.loc[df_low[conditions].index] = \
                np.arange(len(df_low[conditions])) + 1


    # calculate performance
    PAST_WINDOW = 50
    CumPerListHigh = []
    CumPerListLow = []
    for Sid in pd.unique(df_to_plot['SessionID']):
        CumPerListHigh.append(cuf.perf_window_calculator(df_high[df_high['SessionID']==Sid],
                                                         PAST_WINDOW))
        CumPerListLow.append(cuf.perf_window_calculator(df_low[df_low['SessionID']==Sid],
                                                         PAST_WINDOW))
    # flatten the list of lists
    df_high['CurrentPastPerformance100biasremoved'] = np.array(list(chain(*[x for x in CumPerListHigh])))
    df_low['CurrentPastPerformance100biasremoved'] = np.array(list(chain(*[x for x in CumPerListLow])))


.. GENERATED FROM PYTHON SOURCE LINES 291-292

Plot performance on high and low tones

.. GENERATED FROM PYTHON SOURCE LINES 292-301

.. code-block:: default

    col_to_plot = 'CurrentPastPerformance100biasremoved'
    figX = make_figures.make_figure_differences_performance_between_groups(df_low, col_to_plot,
                                                                           hue_order, color_palette)
    data_directory = ''
    plt.savefig(data_directory + 'Chronic_lesions_Performance_between_groups_low_tones.pdf', transparent=True, bbox_inches='tight')
    figX.show()


.. GENERATED FROM PYTHON SOURCE LINES 302-365

.. code-block:: default


    """#### Calculate the significance by resampling: suffle the group labels multiple times and calculate the likelihood of observing this data
    The shuffling respects the proportion of mice in every group.

    References:

    Supplementary figure 4 in here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2562676/

    See also methods here: https://www.biorxiv.org/content/10.1101/716274v3.full
    """

    # define a 100-trial window to bin the data
    xbin = 100
    df_to_plot["TrialIndexBinned"] = (df_to_plot.CumulativeTrialNumberByProtocol // xbin) * xbin + xbin / 2
    print('Trials are binned in groups of', xbin)

    # groupby so each animal has a mean of the performance in each bin
    df_bintr = df_to_plot.groupby(['AnimalID','ExperimentalGroup','TrialIndexBinned','Protocol']).mean().reset_index()

    # create a scaled version of the performance
    df_bintr['Performance'] = df_bintr.FirstPokeCorrect * 100

    # calculate the differences of the real means using the binned data
    real_data_pd = df_bintr[df_bintr.ExperimentalGroup == hue_order[1]].groupby('TrialIndexBinned').mean()['Performance'] -\
                   df_bintr[df_bintr.ExperimentalGroup == hue_order[0]].groupby('TrialIndexBinned').mean()['Performance']

    # Select the amount of times to shuffle. Originally this is done 10,000 times
    # use a smaller number to speed things up
    nsh = 10000
    print('Data is shuffled', nsh, 'times')

    # select the important columns to get the shuffled data
    df_colsel = df_bintr[['AnimalID', 'ExperimentalGroup', 'TrialIndexBinned', 'Performance']].copy()
    # get the shuffled data
    shrdf = cuf.get_shuffled_means_difference_df(df_colsel, hue_order, nsh)

    # calculate the confidence intervals for the shuffled data
    pos_ci = shrdf.groupby('TrialIndexBinned').quantile(.95)
    neg_ci = shrdf.groupby('TrialIndexBinned').quantile(.05)

    """##### The following cell creates the second part of **Figure 1 I** from the paper

    This shows how likely is that each time point crosses 'random' line (point-base significance).
    """

    fig2 = make_figures.make_figure_differences_performance_significance(real_data_pd, pos_ci, neg_ci)

    data_directory = '../../data/'
    plt.savefig(data_directory + 'Differences_of_means_significance_by_trial_bins.pdf',transparent=True, bbox_inches='tight')

    fig2.show()

    """#### Substitute of the mixed anova: find the likelihood of any point being significant. Shuffle more data and quantify the percentage of times there is a crossing. Generate global bands of confidence"""

    quants_to_test = [0.99, 0.995, 0.996, 0.997, 0.998, 0.999, 0.9999]
    nsh=1000

    global_sig = cuf.get_shuffled_means_difference_global_significance(df_colsel, shrdf, quants_to_test, nsh, hue_order)

    # plot the confidence intervals and print their global p-values:
    fig = make_figures.make_figure_differences_performance_significance_global(real_data_pd, quants_to_test, shrdf, global_sig, nsh)
    fig.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.000 seconds)


.. _sphx_glr_download_auto_examples_figure_1_ijk.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: figure_1_ijk.py <figure_1_ijk.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: figure_1_ijk.ipynb <figure_1_ijk.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_