This is the lab wiki. For more info about our team, research, blog and publications see https://anneurai.net.
This assumes that you’ve worked through the general Python setup, and that you understand GitHub, conda and some basic command line tools.
conda install
to install seaborn
and statsmodels
.
conda install -c conda-forge pingouin
.pip install nma-ibl
to get access to the IBL data via DataJoint.dj.conn()
that everything works. Then run dj.config.save_local()
to create a local config file: you won’t have to enter your DataJoint credentials anymore in the future! Note that this local file dj_local_config.json
will only live inside the current folder, so it won’t be recognized if you launch Python from somewhere else.
dj_local_conf.json
to the .gitignore
file in your repo, so that your credentials don’t end up on your public-facing repository on GitHub. You can also copy this .gitignore file.Now create a script that will load and save some data.
Import things
# datajoint-specific stuff
import datajoint as dj
from nma_ibl import reference, subject, acquisition, behavior, behavior_analyses
Then, query some basic information about all sessions that were run.
# which subjects (i.e. mice) are in the database?
subjects = (subject.Subject * subject.SubjectLab * reference.Lab)
# this contains a lot of information that we don't really need
# (and will increase the size of the data we want to download).
# so let's get only the columns that we're interested in
subjects = subjects.proj('subject_nickname', 'sex', 'subject_birth_date', 'time_zone')
# note that this is not yet data - it's only a query to the database. fetch will actually get those data
df_subjects = subjects.fetch(format='frame').sort_values(by=['lab_name', 'subject_nickname']).reset_index()
# same for sessions - only take training sessions here
sessions = behavior.TrialSet * behavior_analyses.PsychResults * behavior_analyses.ReactionTime \
* behavior_analyses.SessionTrainingStatus \
* (acquisition.Session & 'task_protocol LIKE "%training%"') * acquisition.SessionUser \
& subjects #
# only save some fields that we really care about for now (otherwise, the dataframe will explode)
sessions = sessions.proj('n_trials', 'performance_easy', 'threshold', 'bias', 'lapse_low', 'lapse_high',
'training_status', 'user_name',
session_duration='TIMEDIFF(session_end_time,session_start_time)')
df_sessions = sessions.fetch(format='frame').reset_index()
# note: the two dataframes containing subject info and sessions info share
# the column subject_uuid, which is called the 'primary key' that uniquely
# identifies each mouse. use pandas' join to combine the two dataframes -
# but beware the size of the data you're working with.
Now explore the DataFrame, for instance in ‘scientific mode’ in PyCharm or simply by printing different parts and groups to your command line. To better understand what the columns mean, see this and this list by Leon Hommerich as well as the official list of IBL dataset types (this doesn’t match with the DataJoint names one-on-one).
Exercise 1: write some code to save this newly created Pandas DataFrame as a csv file. Make sure to avoid that this (large) datafile gets pushed to GitHub, for instance by creating a /data
folder that is listed in your .gitignore
. Then, for any analysis you want to run, load in this local file - you’ll now be able to get data without connecting to the DataJoint database. Of course, you may need different datafiles for different purposes (at the level of animals, sessions, or trials).
Exercise 2: plot some basic information about all the sessions. When (at what time of day) where they collected? How many sessions were collected per lab, user, and animal? How does performance change as a function of each animal’s progression in training?
Exercise 3: get more detailed info not at the session level (overall accuracy on easy stimuli), but at the individual trial level. You can use sessions * behavior.TrialSet.Trial
to get this, but be warned that this will become huge/slow quickly. Better to first restrict to a subset of sessions (e.g. from one mouse), or to use .proj
to select only those attributes of the TrialSet
that you really need. See here for an example.
Exercise 4: recreate a figure from the preprint, for instance figure 2a. Then compare your solution against the code here.
The IBL’s neural recordings will be fully released once completed. Until then, you can find some example data and check out analyses here.