from IPython.core.display import HTML
def _set_css_style(css_file_path):
"""
Read the custom CSS file and load it into Jupyter.
Pass the file path to the CSS file.
"""
styles = open(css_file_path, "r").read()
s = '<style>%s</style>' % styles
return HTML(s)
_set_css_style('rise.css')
Exercises from last class¶
How many data points are in Spellman.csv?
!wc ../files/Spellman.csv
The first three letters of the ORFs are 'Y' for yeast, the chromosome number, then the chromosome arm. How many ORFs from chromosome A are there?
!grep ^YA ../files/Spellman.csv | wc
How many data points start with a positive expression value?
!awk -F, 'NR > 1 && $2 > 0 {print $0}' ../files/Spellman.csv | wc
What are the 10 data points with the highest initial expression values?
!awk -F, 'NR > 1 {print $1,$2}' ../files/Spellman.csv | sort -k2,2 -n | tail
Introducing Python¶
- Variables and their types
- Basic arithmetic operations
- Accessing elements in containers
- Function definitions
- Basic file i/o
Types and variables¶
value → your data
type → what kind of data it is
variable → the "name" of your data, how you access it
Types¶
Built-in:
- Numerical: integers, floating point, complex
- Boolean (True, False)
- None
- Sequences: strings, tuples, lists, sets, dictionaries
- Callable (functions)
type¶
type(3)
type(3.0)
type("Hello")
type(min)
Arithmetic operations¶
+, -, *, / standard addition, subtraction, multiplication, and division
% modulus (remainder after division)
** exponentiation $x^y$ = x**y
// integer (floor) division
Division is different in Python2 vs 3
Some examples¶
2+2
50 - 5 * 2
(50 - 5) * 2
3**2
Types of division¶
8 / 5
8 // 5
8 % 5
x = 10
x += 1
x
Strings¶
Strings are a sequence of characters, can be defined with
- "double quotes"
- 'single quotes' (more common python style)
- "the difference is how easy it is to include a ' character" 'or a " character'
- special characters are escaped with a backslash, so must always escape backslash itself
\nnewline\\backslash\ttab\'single quote
Multiline string literals¶
"you can end a line with a slash\
and it will continue on the next line"
Adjacent string literals are automatically concatenated
'hi ' 'bye'
Triple quoted strings - for large blocks of text with newlines, commonly used as documentation:
'''There are three
quotes at the
start and
end'''
print('"\\t'"'")
Variables¶
Data values are accessed through references to the value.
A reference is a name for the memory location of the value. Every value exists somewhere in memory and has an address.
A variable is created when it is bound to a value. It is impossible to have an uninitialized variable in python (but can be None).
The type of a variable is the type of the value it is bound to, which can change.
x = 3
y = x
y = y + 1
print(x,y)
x = 'zero'
x
Objects¶
Everything is an object!
An object is a value with set of attributes
Attributes that are callable are called methods and can work on the object data
Attributes are accessed with the . operator
The attributes of an object can be listed with dir
Strings are objects¶
A string is an object that has several methods for manipulating the string data
s = 'Hello World'
print(s.upper())
print(s.split())
print(dir(s))
Numbers are objects¶
Since everything is an object...
x = 3.0
x.is_integer()
print(dir(x))
Container objects¶
A container object has items that are accessed with the [] operator
They hold an arbitrary number of item objects
The len method returns the number of items
Strings are an example of a container object
s = "Hello"
len(s)
s = "Hello"
s[1]
Lists¶
A list is an ordered container of arbitrary objects.
A list is defined by comma separated items in square brackets [].
mylist = [1, 3.0, "cat", 9+2]
print (mylist)
mylist[0]
Lists are objects and have a number of built-in methods:
print(dir(mylist))
l = []
l.append(5)
l.append(1)
l.append(3)
l.sort()
l
More indexing¶
x = [1, 2, 3]
x[0] = 0
x
x[-4]
Math with lists?¶
x = [1, 2, 3]
x + x
Functions (callables)¶
A function is an encapsulation of code; a set of statements that can be executed on request and returns a value
Functions are objects
A method is a function that is an attribute of an object
A function takes arguments and returns a result (maybe None)
The value of a callable type is the address of executable code
print(len("This is a sentence.")) # this function takes one argument
print(divmod(13,4)); # this function takes two arguments
Defining a function¶
def square(x):
return x * x
defstarts definition- The function name is an identifier like a variable name
- Good function names are a critical part of good coding style
- bad: foo, dostuff, process_data
- also bad: ReadInFromFileComputeCorrelationAndOutput
- good: ReadExpressionData, ComputeCorrelation, OutputCorrelationMatrix
- Parameters definine what arguments that function takes
- Parameters are bound to the values passed to the function
- Statements are the body of the function; must be indented
- Return statement exits function and returns specified value
- If omitted,
Noneis returned
- If omitted,
- Function definition ends when no more indentation (whitespace matters!)
def twice(x):
return x*2
dbl = twice # functions are objects
print(dbl(4))
Function scope¶
A function's parameters are bound to the passed value. That is, it's the same as if the parameter was set equal to the passed value (e.g., x = 4)
Parameters and variables bound (assigned to) in the function have local scope
global variables defined outside the function can only be read
x = 4
y = 3
def incr(x):
x = x + 1
return x
print(x, incr(x), x)
x = 4
y = 3
def incr():
y = y + 1
return y
print(y, incr())
Default parameters¶
Default values for parameters can be given. This makes it easy to have optional arguments that take reasonable defaults if not specified.
def foo(x, y=0, z=1):
return (x+y)*z
foo(2)
foo(2, y=3)
foo(2, 3, 4)
Calling functions¶
Functions are called using parentheses ()
It is an error to call a function with an incompatible number of arguments
Named arguments allow you to specify arguments in a different orderthan defined
Unnamed arguments (passed in the order defined) must all be specified before any named arguments
foo(z=2, y=1, x=3)
foo(y=1, x=3)
def foo(x, y=0, z=1):
return (x+y)*z
foo(4, z=2)
Built-in Functions¶
There are a huge number of functions built into the python language and even more are included in standard modules. A few examples:
abs → absolute value
len → length of a sequence (string, list, etc)
min → return the smallest item in a container
max → return the largest item in a container
type → return the type of an object
map → applies a function to every element of a sequence and returns a list of the result
filter → like map, but only returns a list of elements where the function evaluates to True
list(map(ord, "hello")) #ord returns ASCII code of string of length 1
True and false values¶
Every object has a Boolean (true/false) value
bool(None), bool(False), bool(True)
For numerical types 0 is false
bool(0), bool(0.0), bool(-100)
Empty collections are false
bool([]), bool(''), bool([False]), bool([0])
bool('0')
1 < 3
"hello" != "hi"
[1,2,3] == [1,2,3], [1,2,3] == [1,2,3.14]
x = 3; y = 4;
x >= y
Beware exact numerical comparisons¶
1 - 1 == 0
0.1 + 0.1 + 0.1 == 0.3
File objects¶
A file object provides an interface for reading and writing files
Files, unlike memory, are accessed sequentially (like reading a book)
To create a file object use the open function:
fileobject = open(filename, mode)
Where filename is a string that is either a relative path from the current working directory (e.g., file.txt if file.txt is in the current directory) or an absolute path (e.g. /home/user/jpb156/tmp/file.txt)
File mode¶
mode is a string that specifies what you are going to do with the file
- 'r' - file must already exist and will only be read from (default)
- 'w' - file is created or truncated (delete what's already there) and can only be written to
- 'a' - file is open appended to (does not delete existing file) and can only be written to
It is also possible to open files for both read/write access ('r+') but this is tricky and generally not necessary
Manipulating file objects (Methods)¶
close- closes the file when you are done with itread- return the entire file as a string (can also specify optional size argument)readline- return a single line from the file, returned string includes '\n'readlines- return lists of all lineswrite- writes a passed string to the fileseek- set current position of the file; seek(0) starts back at beginning
f = open('../files/brca1.fasta')
f.readline()
What do you expect the following code to print?¶
f = open('../files/brca1.fasta')
f.read()
f.readline()
Activity¶
What percent of this string consists of g or c?
atattaggtttttacctacccaggaaaagccaaccaacctcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgtagctgtcgctcggctgcatgcctagtgcacctac
!wget https://MSCBIO2025-2025.github.io/files/brca1.fasta
How can you extract the gene name (second column) from the first line ofbrca1.fasta?
How many As, Ts, Cs, and Gs are there on the second line of brca1.fasta?
Write a function that takes a file name as an argument and prints out the gene name and percentage of G's and C's in the first line of the sequence.
Hint: Checkout split, count, and strip methods of str