In [ ]:
from IPython.core.display import HTML

def _set_css_style(css_file_path):
   """
   Read the custom CSS file and load it into Jupyter.
   Pass the file path to the CSS file.
   """

   styles = open(css_file_path, "r").read()
   s = '<style>%s</style>' % styles     
   return HTML(s)

_set_css_style('rise.css')

Exercises from last class¶

How many data points are in Spellman.csv?

In [ ]:
!wc ../files/Spellman.csv

The first three letters of the ORFs are 'Y' for yeast, the chromosome number, then the chromosome arm. How many ORFs from chromosome A are there?

In [ ]:
!grep ^YA ../files/Spellman.csv | wc

How many data points start with a positive expression value?

In [ ]:
!awk -F, 'NR > 1 && $2 > 0 {print $0}' ../files/Spellman.csv | wc

What are the 10 data points with the highest initial expression values?

In [ ]:
!awk -F, 'NR > 1  {print $1,$2}' ../files/Spellman.csv  | sort -k2,2 -n | tail

Introducing Python¶

print view
notebook

  • Variables and their types
  • Basic arithmetic operations
  • Accessing elements in containers
  • Function definitions
  • Basic file i/o

Types and variables¶

value     → your data
type      → what kind of data it is
variable  → the "name" of your data, how you access it

Types¶

Built-in:

  • Numerical: integers, floating point, complex
  • Boolean (True, False)
  • None
  • Sequences: strings, tuples, lists, sets, dictionaries
  • Callable (functions)

type¶

In [ ]:
type(3)
In [ ]:
type(3.0)
In [ ]:
type("Hello")
In [ ]:
type(min)

Arithmetic operations¶

+, -, *, / standard addition, subtraction, multiplication, and division

% modulus (remainder after division)

** exponentiation $x^y$ = x**y

// integer (floor) division

Division is different in Python2 vs 3

Some examples¶

In [ ]:
2+2
In [ ]:
50 - 5 * 2
In [ ]:
(50 - 5) * 2
In [ ]:
3**2

Types of division¶

In [ ]:
8 / 5
In [ ]:
8 // 5
In [ ]:
8 % 5

Assignment operators¶

Can perform an operation while assigning

a op= b

is

a = a op b

In [ ]:
x = 10
x += 1
x

Strings¶

Strings are a sequence of characters, can be defined with

  • "double quotes"
  • 'single quotes' (more common python style)
  • "the difference is how easy it is to include a ' character" 'or a " character'
  • special characters are escaped with a backslash, so must always escape backslash itself
    • \n newline
    • \\ backslash
    • \t tab
    • \' single quote

Multiline string literals¶

In [ ]:
"you can end a line with a slash\
and it will continue on the next line"

Adjacent string literals are automatically concatenated

In [ ]:
'hi ' 'bye'

Triple quoted strings - for large blocks of text with newlines, commonly used as documentation:

In [ ]:
'''There are three
quotes at the
start and
end'''
In [ ]:
print('"\\t'"'")

Variables¶

Data values are accessed through references to the value.

A reference is a name for the memory location of the value. Every value exists somewhere in memory and has an address.

A variable is created when it is bound to a value. It is impossible to have an uninitialized variable in python (but can be None).

The type of a variable is the type of the value it is bound to, which can change.

In [ ]:
x = 3
y = x
y = y + 1
print(x,y)
In [ ]:
x = 'zero'
x

Objects¶

Everything is an object!

An object is a value with set of attributes

Attributes that are callable are called methods and can work on the object data

Attributes are accessed with the . operator

The attributes of an object can be listed with dir

Strings are objects¶

A string is an object that has several methods for manipulating the string data

In [ ]:
s = 'Hello World'
print(s.upper())
print(s.split())

String Methods Documentation

In [ ]:
print(dir(s))

Numbers are objects¶

Since everything is an object...

In [ ]:
x = 3.0
In [ ]:
x.is_integer()
In [ ]:
print(dir(x))

Container objects¶

A container object has items that are accessed with the [] operator

They hold an arbitrary number of item objects

The len method returns the number of items

Strings are an example of a container object

In [ ]:
s = "Hello"
len(s)
In [ ]:
s = "Hello"
s[1]

Lists¶

A list is an ordered container of arbitrary objects.

A list is defined by comma separated items in square brackets [].

In [ ]:
mylist = [1, 3.0, "cat", 9+2]
print (mylist)
In [ ]:
mylist[0]

Lists are objects and have a number of built-in methods:

In [ ]:
print(dir(mylist))
In [ ]:
l = []
l.append(5)
l.append(1)
l.append(3)
l.sort()
In [ ]:
l

More indexing¶

In [ ]:
x = [1, 2, 3]
In [ ]:
x[0] = 0
x
In [ ]:
x[-4]

Math with lists?¶

In [ ]:
x = [1, 2, 3]
x + x

Functions (callables)¶

A function is an encapsulation of code; a set of statements that can be executed on request and returns a value

Functions are objects

A method is a function that is an attribute of an object

A function takes arguments and returns a result (maybe None)

The value of a callable type is the address of executable code

In [ ]:
print(len("This is a sentence."))  # this function takes one argument
print(divmod(13,4)); # this function takes two arguments

Defining a function¶

In [ ]:
def square(x):
    return x * x
  • def starts definition
  • The function name is an identifier like a variable name
  • Good function names are a critical part of good coding style
    • bad: foo, dostuff, process_data
    • also bad: ReadInFromFileComputeCorrelationAndOutput
    • good: ReadExpressionData, ComputeCorrelation, OutputCorrelationMatrix
  • Parameters definine what arguments that function takes
    • Parameters are bound to the values passed to the function
  • Statements are the body of the function; must be indented
  • Return statement exits function and returns specified value
    • If omitted, None is returned
  • Function definition ends when no more indentation (whitespace matters!)
In [ ]:
def twice(x):
	return x*2

dbl = twice  # functions are objects
print(dbl(4))

Function scope¶

A function's parameters are bound to the passed value. That is, it's the same as if the parameter was set equal to the passed value (e.g., x = 4)

Parameters and variables bound (assigned to) in the function have local scope

global variables defined outside the function can only be read

In [ ]:
x = 4 
y = 3 
def incr(x):
    x = x + 1 
    return x

print(x, incr(x), x)
In [ ]:
x = 4 
y = 3 
def incr():
    y = y + 1 
    return y

print(y, incr())

Default parameters¶

Default values for parameters can be given. This makes it easy to have optional arguments that take reasonable defaults if not specified.

In [ ]:
def foo(x, y=0, z=1):
	return (x+y)*z

foo(2)
In [ ]:
foo(2, y=3)
In [ ]:
foo(2, 3, 4)

Calling functions¶

Functions are called using parentheses ()

It is an error to call a function with an incompatible number of arguments

Named arguments allow you to specify arguments in a different orderthan defined

Unnamed arguments (passed in the order defined) must all be specified before any named arguments

In [ ]:
foo(z=2, y=1, x=3)
In [ ]:
foo(y=1, x=3)
In [ ]:
def foo(x, y=0, z=1):
     return (x+y)*z
In [ ]:
foo(4, z=2)

Built-in Functions¶

There are a huge number of functions built into the python language and even more are included in standard modules. A few examples:

abs     → absolute value  
len     → length of a sequence (string, list, etc)  
min     → return the smallest item in a container  
max     → return the largest item in a container  
type    → return the type of an object
map     → applies a function to every element of a sequence and returns a list of the result
filter  → like map, but only returns a list of elements where the function evaluates to True
In [ ]:
list(map(ord, "hello")) #ord returns ASCII code of string of length 1

True and false values¶

Every object has a Boolean (true/false) value

In [ ]:
bool(None), bool(False), bool(True)

For numerical types 0 is false

In [ ]:
bool(0), bool(0.0), bool(-100)

Empty collections are false

In [ ]:
bool([]), bool(''), bool([False]), bool([0])
In [ ]:
bool('0')

Comparison operators¶

Return a boolean value

<, >, !=, ==, <=, >=

In [ ]:
1 < 3
In [ ]:
"hello" != "hi"
In [ ]:
[1,2,3] == [1,2,3], [1,2,3] == [1,2,3.14]
In [ ]:
x = 3; y = 4;
x >= y

Beware exact numerical comparisons¶

In [ ]:
1 - 1 == 0
In [ ]:
0.1 + 0.1 + 0.1 == 0.3

File objects¶

A file object provides an interface for reading and writing files

Files, unlike memory, are accessed sequentially (like reading a book)

To create a file object use the open function:

fileobject = open(filename, mode)

Where filename is a string that is either a relative path from the current working directory (e.g., file.txt if file.txt is in the current directory) or an absolute path (e.g. /home/user/jpb156/tmp/file.txt)

File mode¶

mode is a string that specifies what you are going to do with the file

  • 'r' - file must already exist and will only be read from (default)
  • 'w' - file is created or truncated (delete what's already there) and can only be written to
  • 'a' - file is open appended to (does not delete existing file) and can only be written to

It is also possible to open files for both read/write access ('r+') but this is tricky and generally not necessary

Manipulating file objects (Methods)¶

  • close - closes the file when you are done with it
  • read - return the entire file as a string (can also specify optional size argument)
  • readline - return a single line from the file, returned string includes '\n'
  • readlines - return lists of all lines
  • write - writes a passed string to the file
  • seek - set current position of the file; seek(0) starts back at beginning
In [ ]:
f = open('../files/brca1.fasta')
f.readline()

What do you expect the following code to print?¶

In [ ]:
f = open('../files/brca1.fasta')
f.read()
f.readline()

Activity¶

What percent of this string consists of g or c?

atattaggtttttacctacccaggaaaagccaaccaacctcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgtagctgtcgctcggctgcatgcctagtgcacctac

In [ ]:
!wget https://MSCBIO2025-2025.github.io/files/brca1.fasta

How can you extract the gene name (second column) from the first line ofbrca1.fasta?

How many As, Ts, Cs, and Gs are there on the second line of brca1.fasta?

Write a function that takes a file name as an argument and prints out the gene name and percentage of G's and C's in the first line of the sequence.

Hint: Checkout split, count, and strip methods of str