Applied Python Programming

This seminar was developed to teach life scientists about Python. APP was offered over the last years in spring and fall semesters by Boas Pucker and Bernd Weisshaar. Basic Python skills are provided to solve biological questions. Plenty of time is used to solve little challenges around the topic of each session. Finally, all participants demonstrate their new skills by solving a 'real' issue via Python. The following project descriptions were provided by participants. These examples display some of the achieved skills and should encourage interested persons to attend the seminar in the next semesters.


Python project examples

1) tBLASTn to identify transposases in the genome sequence of Arabidopsis thaliana (2016/2017)

(by Katharina Frey)

This code is using tBLASTn to find sequences which are similar to the mutator transposases in the Arabidopsis thaliana reference genome sequence. Peptide sequences of the transposase are serving as query and the reference genome sequence of A. thaliana was used as subject. The positions of tBLASTn hits with an e-value smaller than 0.0000000001 were then plotted onto an axis, which represents one of the five chromosomes of A. thaliana. The resulting graphic shows locations of sequences similar to the transposase MUDRA of A. thaliana (red triangles ) and the transposase MUDRB of Zea mays (yellow triangles), respectively.


Fig. 1: Part of Python script for detection of transposase genes in A. thaliana Col-0 genome sequence.
Fig. 2: Results of transposase gene search in A. thaliana Col-0 genome sequence.

2) Python Project on Population Genetics
(by Nicole Walker)

B-chromosomes are supernumerary selfish elements that do not follow Mendel’s laws and are passed on to the next generation with a chance higher than 50%. Due to their harmful effects on the organism itself there has to be selection acting against the accumulation of the B-chromosome. On the basis of M. Shaw’s paper The population genetics of the B-chromosome polymorphism of Myrmeleotettix maculatus (1983) a theoretical model was developed to calculate the frequencies of B-chromosomes over the generations in a virtual population. To make the input easier for people that are not too familiar with programming and preventing changes in the code, an easyGUI query was used to forward the entered values for the different variables.

Two functions were now defined that were taken from Shaw’s equation model to calculate the B-chromosome frequencies (z0 and z1) for the following generation with the set starting conditions. To receive all values over the given number of generations another function was defined to fill in a matrix with two lines and as many columns as generations given filled with zeros. The two entries in the first column are set as the starting values for z0 (proportion of no B-chromosome individuals) and z1 (one B-chromsome). Then - for the range of generations - the matrix is filled with the calculated values computed by the functions of Shaw. The function now gives back a matrix of two lines
Fig. 1: Calculated z-frequencies (z0 and z1) with the values for every generation that now can be plotted.

To do so, the matrix is divided into two single lines so that the values for z0 and z1 can be plotted seperately. In order to see the influence of the selection pressure on the B-chromosome frequencies the given z-values are plotted for selection pressures from -1 (B-chromsome advantage) to 1 (B-chromosome disadvantage) by a loop. z0 and z1-frequencies are plotted relatively because there is an option for 2B  individuals that is not considered which means z0 + z1 != 1. Then, axis are labelled and scaled and the figure is saved and opened for view.
Fig. 2: Python code with easyGUI query for the starting conditions.
3) Automatic Downloading of BioBrick sequences

(by Olga Schmidt)

One of the largest open source database for synthetic biology is the Registry of Standard biological parts associated with the iGEM (international Genetically Engineered Machine) competition. Extracting sequences from this repository by hand is labor-intensive and error-prone. Therefore, the automatic retrieval of sequences based on their name could facilitate synthetic biology projects. My python script makes use of established libraries to extract content from websites and can be used to collect automatically a number of sequences based on their names provided in a simple text file as the input file (Fig. 1). BioBrick names are read from this input text file. These little strings are added to another string which contains the general URL to all parts in the repository. The combined URLs are used to retrieve the sequences.
Fig. 1: Python code of script for automatic download of BioBrick sequences.