In [1]:
from IPython.core.display import HTML

def _set_css_style(css_file_path):
   """
   Read the custom CSS file and load it into Jupyter.
   Pass the file path to the CSS file.
   """

   styles = open(css_file_path, "r").read()
   s = '<style>%s</style>' % styles     
   return HTML(s)

_set_css_style('rise.css')

## Exercises from last class

How many data points are in `Spellman.csv`?  

In [2]:
!wc ../files/Spellman.csv

    4382    4382  609183 ../files/Spellman.csv


The first three letters of the ORFs are 'Y' for yeast, the chromosome number, then the chromosome arm. How many ORFs from chromosome A are there?

In [3]:
!grep ^YA ../files/Spellman.csv | wc

      27      27    3733


How many data points start with a positive expression value?   

In [4]:
!awk -F, 'NR > 1 && $2 > 0 {print $0}' ../files/Spellman.csv | wc

    2354    2354  326497


What are the 10 data points with the highest initial expression values?  

In [5]:
!awk -F, 'NR > 1  {print $1,$2}' ../files/Spellman.csv  | sort -k2,2 -n | tail

YGR151C 1.5
YPL256C 1.5699999
YLR121C 1.605
YGR189C 1.625
YLR183C 1.63
YOL007C 1.655
YML027W 1.74
YLR194C 1.825
YGL055W 1.845
YDL003W 1.93


## Introducing Python

<a href="?print-pdf">print view</a><br>
<a href="lecture-04-python.ipynb">notebook</a>

- Variables and their types  
- Basic arithmetic operations  
- Accessing elements in containers
- Function definitions
- Basic file i/o  

## Types and variables

```
value     → your data
type      → what kind of data it is
variable  → the "name" of your data, how you access it
```

## Types

Built-in:
* Numerical: integers, floating point, complex
* Boolean (True, False)
* None
* Sequences: strings, tuples, lists, sets, dictionaries
* Callable (functions)

### type

In [6]:
type(3)

int

In [7]:
type(3.0)

float

In [8]:
type("Hello")

str

In [9]:
type(min)

builtin_function_or_method

## Arithmetic operations

`+`, `-`, `*`, `/`  standard addition, subtraction, multiplication, and division

`%` modulus (remainder after division)

`**` exponentiation  $x^y$ = `x**y`

`//` integer (floor) division  

**Division is different in Python2 vs 3**

### Some examples

In [10]:
2+2

4

In [13]:
50 - (5 * 2)

40

In [12]:
(50 - 5) * 2

90

In [14]:
3**2

9

### Types of division

In [15]:
8 / 5

1.6

In [16]:
8 // 5

1

In [17]:
8 % 5

3

## Assignment operators

Can perform an operation while assigning

a *op*= b

is

a = a *op* b

In [21]:
x = 10
x /= 2
x

5.0

## Strings

Strings are a sequence of characters, can be defined with
 *  "double quotes"
 * 'single quotes' (more common python style)
 * "the difference is how easy it is to include a ' character" 'or a " character'
 * special characters are _escaped_ with a backslash, so must always escape backslash itself
    * `\n` newline
    * `\\` backslash
    * `\t` tab
    * `\'` single quote

## Multiline string literals

In [22]:
"you can end a line with a slash\
and it will continue on the next line"

'you can end a line with a slashand it will continue on the next line'

  Adjacent string literals are automatically concatenated 

In [23]:
'hi ' 'bye'

'hi bye'

Triple quoted strings - for large blocks of text with newlines, commonly used as documentation:

In [24]:
'''There are three
quotes at the
start and
end'''

'There are three\nquotes at the\nstart and\nend'

In [25]:
print('"\\t'"'")

"\t'


## Variables

Data values are accessed through _references_ to the value.

A reference is a name for the memory location of the value.
Every value exists somewhere in memory and has an address.

A variable is created when it is _bound_ to a value. It is impossible to have an uninitialized variable in python (but can be None).
 
The type of a variable is the type of the value it is bound to, which can change.  

In [26]:
x = 3
y = x
y = y + 1
print(x,y)

3 4


In [27]:
x = 'zero'
x

'zero'

## Objects

Everything is an object!

An object is a value with set of _attributes_ 

Attributes that are callable are called _methods_ and can work on the object data

Attributes are accessed with the `.` operator

The attributes of an object can be listed with `dir`

## Strings are objects

A string is an object that has several methods for manipulating the string data

In [30]:
s = 'Hello World'
print(s.upper())
print(s.split())

y = s.split()
y

HELLO WORLD
['Hello', 'World']


['Hello', 'World']

<a href="https://docs.python.org/3/library/stdtypes.html#string-methods">String Methods Documentation</a>

In [29]:
print(dir(s))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


## Numbers are objects

Since everything is an object...

In [31]:
x = 3.0

In [32]:
x.is_integer()

True

In [33]:
print(dir(x))

['__abs__', '__add__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getformat__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex', 'hex', 'imag', 'is_integer', 'real']


## Container objects

A container object has _items_ that are accessed with the `[]` operator

They hold an arbitrary number of item objects

The `len` method returns the number of items

Strings are an example of a container object

In [34]:
s = "Hello"
len(s)

5

In [36]:
s = "Hello"
s[0]

'H'

## Lists

A list is an ordered container of arbitrary objects.

A list is defined by comma separated items in square brackets `[]`.

In [37]:
mylist = [1, 3.0, "cat", 9+2]
print (mylist)

[1, 3.0, 'cat', 11]


In [40]:
mylist

list

Lists are objects and have a number of built-in methods:

In [39]:
print(dir(mylist))

['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [41]:
l = []
l.append(5)
l.append(1)
l.append(3)
print(l)
l.sort()

[5, 1, 3]


In [42]:
l

[1, 3, 5]

## More indexing

In [43]:
x = [1, 2, 3]

In [44]:
x[0] = 0
x

[0, 2, 3]

In [48]:
x[-4]

IndexError: list index out of range

## Math with lists?

In [49]:
x = [1, 2, 3]
x + x

[1, 2, 3, 1, 2, 3]

## Functions (callables)

A function is an encapsulation of code; a set of statements that can be executed on request and returns a value

Functions are objects  

A method is a function that is an attribute of an object  

A function takes _arguments_ and returns a result (maybe None)

The value of a callable type is the address of executable code  

In [50]:
print(len("This is a sentence."))  # this function takes one argument
print(divmod(13,4)); # this function takes two arguments

19
(3, 1)


## Defining a function

In [51]:
def square(x):
    return x * x

 - `def` starts definition
 - The function name is an identifier like a variable name
 - Good function names are a critical part of good coding style
      - bad: foo, dostuff, process_data  
      - also bad: ReadInFromFileComputeCorrelationAndOutput  
      - good: ReadExpressionData, ComputeCorrelation, OutputCorrelationMatrix   
 - _Parameters_ definine what arguments that function takes  
      - Parameters are _bound_ to the values passed to the function  
 - Statements are the body of the function; **must be indented**  
 - Return statement exits function and returns specified value   
      - If omitted, `None` is returned  
 - Function definition ends when no more indentation (**whitespace matters!**)  

In [52]:
def twice(x):
	return x*2

dbl = twice  # functions are objects
print(dbl(4))

8


## Function scope

A function's parameters are bound to the passed value. That is, it's the same as if the parameter was set equal to the passed value (e.g., `x = 4`)

Parameters and variables bound (assigned to) in the function have _local scope_

_global_ variables defined outside the function can only be read


In [53]:
x = 4 
y = 3 
def incr(x):
    x = x + 1 
    return x

print(x, incr(x), x)

4 5 4


In [55]:
x = 4 
y = 3 
def incr(y):
    y = y + 1 
    return y

print(y, incr(y))

3 4


## Default parameters

Default values for parameters can be given. This makes it easy to have optional arguments that take reasonable defaults if not specified.

In [56]:
def foo(x, y=0, z=1):
	return (x+y)*z

foo(2)

2

In [58]:
foo(2, z=3)

6

In [59]:
foo(2, 3, 4)

20

## Calling functions

Functions are called using parentheses `()`  

It is an error to call a function with an incompatible number of arguments  

_Named_ arguments allow you to specify arguments in a different orderthan defined  

Unnamed arguments (passed in the order defined) must all be specified before any named arguments  

In [60]:
foo(z=2, y=1, x=3)

8

In [62]:
foo(y=1, 3)

SyntaxError: positional argument follows keyword argument (3278951492.py, line 1)

In [67]:
def foo(x, y=0, z=1):
     return (x+y)*z

In [68]:
foo(4, z=2)

8

## Built-in Functions

There are a huge number of functions built into the python language and
even more are included in standard modules.  A few examples:

```
abs     → absolute value  
len     → length of a sequence (string, list, etc)  
min     → return the smallest item in a container  
max     → return the largest item in a container  
type    → return the type of an object
map     → applies a function to every element of a sequence and returns a list of the result
filter  → like map, but only returns a list of elements where the function evaluates to True
```

In [69]:
list(map(ord, "hello")) #ord returns ASCII code of string of length 1

[104, 101, 108, 108, 111]

## True and false values

Every object has a Boolean (true/false) value

In [70]:
bool(None), bool(False), bool(True)

(False, False, True)

For numerical types 0 is false

In [71]:
bool(0), bool(0.0), bool(-100)

(False, False, True)

Empty collections are false

In [72]:
bool([]), bool(''), bool([False]), bool([0])

(False, False, True, True)

In [73]:
bool('0')

True

## Comparison operators

Return a boolean value

 `<`, `>`, `!=`, `==`, `<=`, `>=`

In [74]:
1 < 3

True

In [75]:
"hello" != "hi"

True

In [76]:
[1,2,3] == [1,2,3], [1,2,3] == [1,2,3.14]

(True, False)

In [77]:
x = 3; y = 4;
x >= y

False

## Beware exact numerical comparisons

In [78]:
1 - 1 == 0

True

In [79]:
0.1 + 0.1 + 0.1 == 0.3

False

## File objects

A file object provides an interface for reading and writing files  

Files, unlike memory, are **accessed sequentially** (like reading a book)  

To create a file object use the `open` function:  

`fileobject = open(filename, mode)`  

Where filename is a string that is either a relative path from the current working directory (e.g., file.txt if file.txt is in the current directory) or an absolute path (e.g. `/home/user/jpb156/tmp/file.txt`)  

### File mode

`mode` is a string that specifies what you are going to do with the file  
 * 'r' - file must already exist and will only be read from (default)
 * 'w' - file is created or truncated (delete what's already there) and can only be written to
 * 'a' - file is open appended to (does not delete existing file) and can only be written to

It is also possible to open files for both read/write access ('r+') but this is tricky and generally not necessary  

### Manipulating file objects (Methods)

 * `close` - closes the file when you are done with it
 * `read` - return the entire file as a string (can also specify optional size argument)
 * `readline` - return a single line from the file, returned string includes '\n'
 * `readlines` - return lists of all lines
 * `write` - writes a passed string to the file
 * `seek` - set current position of the file; seek(0) starts back at beginning 

In [80]:
f = open('../files/brca1.fasta')
f.readline()

'>lcl|NC_000017.10_cdsid_NP_009225.1 [gene=BRCA1] [protein=breast cancer type 1 susceptibility protein isoform 1] [protein_id=NP_009225.1] [location=complement(join(41197695..41197819,41199660..41199720,41201138..41201211,41203080..41203134,41209069..41209152,41215350..41215390,41215891..41215968,41219625..41219712,41222945..41223255,41226348..41226538,41228505..41228631,41234421..41234592,41242961..41243049,41243452..41246877,41247863..41247939,41249261..41249306,41251792..41251897,41256139..41256278,41256885..41256973,41258473..41258550,41267743..41267796,41276034..41276113))]\n'

### What do you expect the following code to print?

In [81]:
f = open('../files/brca1.fasta')
f.read()
f.readline()

''

## Activity

What percent of this string consists of g or c?

`atattaggtttttacctacccaggaaaagccaaccaacctcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgtagctgtcgctcggctgcatgcctagtgcacctac`

In [None]:
!wget https://MSCBIO2025-2025.github.io/files/brca1.fasta

How can you extract the gene name (second column) from the first line of`brca1.fasta`?

How many As, Ts, Cs, and Gs are there on the _second_ line of `brca1.fasta`?

Write a function that takes a file name as an argument and prints out the gene name and percentage of G's and C's in the first line of the sequence.

**Hint**: Checkout `split`, `count`, and `strip` methods of `str`