from IPython.core.display import HTML
def _set_css_style(css_file_path):
"""
Read the custom CSS file and load it into Jupyter.
Pass the file path to the CSS file.
"""
styles = open(css_file_path, "r").read()
s = '<style>%s</style>' % styles
return HTML(s)
_set_css_style('rise.css')
Exercises from last class¶
How many data points are in Spellman.csv
?
!wc ../files/Spellman.csv
The first three letters of the ORFs are 'Y' for yeast, the chromosome number, then the chromosome arm. How many ORFs from chromosome A are there?
!grep ^YA ../files/Spellman.csv | wc
How many data points start with a positive expression value?
!awk -F, 'NR > 1 && $2 > 0 {print $0}' ../files/Spellman.csv | wc
What are the 10 data points with the highest initial expression values?
!awk -F, 'NR > 1 {print $1,$2}' ../files/Spellman.csv | sort -k2,2 -n | tail
Introducing Python¶
- Variables and their types
- Basic arithmetic operations
- Accessing elements in containers
- Function definitions
- Basic file i/o
Types and variables¶
value → your data
type → what kind of data it is
variable → the "name" of your data, how you access it
Types¶
Built-in:
- Numerical: integers, floating point, complex
- Boolean (True, False)
- None
- Sequences: strings, tuples, lists, sets, dictionaries
- Callable (functions)
type¶
type(3)
type(3.0)
type("Hello")
type(min)
Arithmetic operations¶
+
, -
, *
, /
standard addition, subtraction, multiplication, and division
%
modulus (remainder after division)
**
exponentiation $x^y$ = x**y
//
integer (floor) division
Division is different in Python2 vs 3
Some examples¶
2+2
50 - 5 * 2
(50 - 5) * 2
3**2
Types of division¶
8 / 5
8 // 5
8 % 5
x = 10
x += 1
x
Strings¶
Strings are a sequence of characters, can be defined with
- "double quotes"
- 'single quotes' (more common python style)
- "the difference is how easy it is to include a ' character" 'or a " character'
- special characters are escaped with a backslash, so must always escape backslash itself
\n
newline\\
backslash\t
tab\'
single quote
Multiline string literals¶
"you can end a line with a slash\
and it will continue on the next line"
Adjacent string literals are automatically concatenated
'hi ' 'bye'
Triple quoted strings - for large blocks of text with newlines, commonly used as documentation:
'''There are three
quotes at the
start and
end'''
print('"\\t'"'")
Variables¶
Data values are accessed through references to the value.
A reference is a name for the memory location of the value. Every value exists somewhere in memory and has an address.
A variable is created when it is bound to a value. It is impossible to have an uninitialized variable in python (but can be None).
The type of a variable is the type of the value it is bound to, which can change.
x = 3
y = x
y = y + 1
print(x,y)
x = 'zero'
x
Objects¶
Everything is an object!
An object is a value with set of attributes
Attributes that are callable are called methods and can work on the object data
Attributes are accessed with the .
operator
The attributes of an object can be listed with dir
Strings are objects¶
A string is an object that has several methods for manipulating the string data
s = 'Hello World'
print(s.upper())
print(s.split())
print(dir(s))
Numbers are objects¶
Since everything is an object...
x = 3.0
x.is_integer()
print(dir(x))
Container objects¶
A container object has items that are accessed with the []
operator
They hold an arbitrary number of item objects
The len
method returns the number of items
Strings are an example of a container object
s = "Hello"
len(s)
s = "Hello"
s[1]
Lists¶
A list is an ordered container of arbitrary objects.
A list is defined by comma separated items in square brackets []
.
mylist = [1, 3.0, "cat", 9+2]
print (mylist)
mylist[0]
Lists are objects and have a number of built-in methods:
print(dir(mylist))
l = []
l.append(5)
l.append(1)
l.append(3)
l.sort()
l
More indexing¶
x = [1, 2, 3]
x[0] = 0
x
x[-4]
Math with lists?¶
x = [1, 2, 3]
x + x
Functions (callables)¶
A function is an encapsulation of code; a set of statements that can be executed on request and returns a value
Functions are objects
A method is a function that is an attribute of an object
A function takes arguments and returns a result (maybe None)
The value of a callable type is the address of executable code
print(len("This is a sentence.")) # this function takes one argument
print(divmod(13,4)); # this function takes two arguments
Defining a function¶
def square(x):
return x * x
def
starts definition- The function name is an identifier like a variable name
- Good function names are a critical part of good coding style
- bad: foo, dostuff, process_data
- also bad: ReadInFromFileComputeCorrelationAndOutput
- good: ReadExpressionData, ComputeCorrelation, OutputCorrelationMatrix
- Parameters definine what arguments that function takes
- Parameters are bound to the values passed to the function
- Statements are the body of the function; must be indented
- Return statement exits function and returns specified value
- If omitted,
None
is returned
- If omitted,
- Function definition ends when no more indentation (whitespace matters!)
def twice(x):
return x*2
dbl = twice # functions are objects
print(dbl(4))
Function scope¶
A function's parameters are bound to the passed value. That is, it's the same as if the parameter was set equal to the passed value (e.g., x = 4
)
Parameters and variables bound (assigned to) in the function have local scope
global variables defined outside the function can only be read
x = 4
y = 3
def incr(x):
x = x + 1
return x
print(x, incr(x), x)
x = 4
y = 3
def incr():
y = y + 1
return y
print(y, incr())
Default parameters¶
Default values for parameters can be given. This makes it easy to have optional arguments that take reasonable defaults if not specified.
def foo(x, y=0, z=1):
return (x+y)*z
foo(2)
foo(2, y=3)
foo(2, 3, 4)
Calling functions¶
Functions are called using parentheses ()
It is an error to call a function with an incompatible number of arguments
Named arguments allow you to specify arguments in a different orderthan defined
Unnamed arguments (passed in the order defined) must all be specified before any named arguments
foo(z=2, y=1, x=3)
foo(y=1, x=3)
def foo(x, y=0, z=1):
return (x+y)*z
foo(4, z=2)
Built-in Functions¶
There are a huge number of functions built into the python language and even more are included in standard modules. A few examples:
abs → absolute value
len → length of a sequence (string, list, etc)
min → return the smallest item in a container
max → return the largest item in a container
type → return the type of an object
map → applies a function to every element of a sequence and returns a list of the result
filter → like map, but only returns a list of elements where the function evaluates to True
list(map(ord, "hello")) #ord returns ASCII code of string of length 1
True and false values¶
Every object has a Boolean (true/false) value
bool(None), bool(False), bool(True)
For numerical types 0 is false
bool(0), bool(0.0), bool(-100)
Empty collections are false
bool([]), bool(''), bool([False]), bool([0])
bool('0')
1 < 3
"hello" != "hi"
[1,2,3] == [1,2,3], [1,2,3] == [1,2,3.14]
x = 3; y = 4;
x >= y
Beware exact numerical comparisons¶
1 - 1 == 0
0.1 + 0.1 + 0.1 == 0.3
File objects¶
A file object provides an interface for reading and writing files
Files, unlike memory, are accessed sequentially (like reading a book)
To create a file object use the open
function:
fileobject = open(filename, mode)
Where filename is a string that is either a relative path from the current working directory (e.g., file.txt if file.txt is in the current directory) or an absolute path (e.g. /home/user/jpb156/tmp/file.txt
)
File mode¶
mode
is a string that specifies what you are going to do with the file
- 'r' - file must already exist and will only be read from (default)
- 'w' - file is created or truncated (delete what's already there) and can only be written to
- 'a' - file is open appended to (does not delete existing file) and can only be written to
It is also possible to open files for both read/write access ('r+') but this is tricky and generally not necessary
Manipulating file objects (Methods)¶
close
- closes the file when you are done with itread
- return the entire file as a string (can also specify optional size argument)readline
- return a single line from the file, returned string includes '\n'readlines
- return lists of all lineswrite
- writes a passed string to the fileseek
- set current position of the file; seek(0) starts back at beginning
f = open('../files/brca1.fasta')
f.readline()
What do you expect the following code to print?¶
f = open('../files/brca1.fasta')
f.read()
f.readline()
Activity¶
What percent of this string consists of g or c?
atattaggtttttacctacccaggaaaagccaaccaacctcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgtagctgtcgctcggctgcatgcctagtgcacctac
!wget https://MSCBIO2025-2025.github.io/files/brca1.fasta
How can you extract the gene name (second column) from the first line ofbrca1.fasta
?
How many As, Ts, Cs, and Gs are there on the second line of brca1.fasta
?
Write a function that takes a file name as an argument and prints out the gene name and percentage of G's and C's in the first line of the sequence.
Hint: Checkout split
, count
, and strip
methods of str