In [ ]:
from IPython.core.display import HTML

def _set_css_style(css_file_path):
   """
   Read the custom CSS file and load it into Jupyter.
   Pass the file path to the CSS file.
   """

   styles = open(css_file_path, "r").read()
   s = '<style>%s</style>' % styles     
   return HTML(s)

_set_css_style('rise.css')

Scripts and text processing with the Linux command line¶

print view
notebook

  • Environment variables
  • Command line control structures
  • Input/output control
  • Simple text processing

Reviewing shell commands¶

Path commands

ls    ← list files
cd    ← change directory
pwd   ← print working (current) directory
..    ← referral to parent directory
.     ← referral to current directory

File manipulation

cp   ← copy
mv   ← move
rm   ← remove (delete)

Environment variables¶

Variables are also stored in terminal sessions
NAME=value sets NAME equal to value, with no spaces around =

export NAME=value sets NAME equal to value and make it stick in future sessions

$ dereference (get the value of) the variable

In [ ]:
%%bash
X=3
echo $X
In [ ]:
%%bash
X=hello
echo $X
In [ ]:
%%bash
echo X

More complex variables¶

Commands can also be set as variables with backticks `cmd`

In [ ]:
%%bash
X=`ls *.css`
echo $X
In [ ]:
%%bash
X=3
echo $X
echo ${X}
echo '$X'
echo \"$X\"

Some common special characters¶

$   ← dereference variable  
*   ← wildcard (see also ? and [...] for more restrictions)  
\   ← escape character  

... and more examples soon

Command line control structures¶

Bash can run simple loops, if/then statements, etc.

In [ ]:
%%bash
for i in x y z
do
  echo $i
done
In [ ]:
%%bash
for file in *.css
do
  echo $file
done

Nested control structures¶

In [ ]:
%%bash
for i in {1..10}
do
  if [ $i -gt 5 ]; then
    echo $i
  fi
done

Note: in bash, >, <, etc. are string comparators -- use -gt, -lt, etc. instead

Input/output redirection¶

>    ← send standard output to file  
>>   ← append standard output to file  
<    ← send file to standard input of command  
2>   ← send standard error to file  
&>   ← send output and error to file

Example -- what prints out?¶

cat reads the contents of a file

In [ ]:
%%bash
echo Hello > h.txt  
echo World >> h.txt  
cat h.txt

Pipes to chain commands¶

A pipe (|) redirects the standard output of one program to the standard input of another. It's like you typed the output of the first program into the second. This allows us to chain simple programs together to do something more complicated.

WC(1)                       General Commands Manual                      WC(1)

NAME  
     wc – word, line, character, and byte count

SYNOPSIS
     wc [--libxo] [-Lclmw] [file ...]

DESCRIPTION  
     The wc utility displays the number of lines, words, and bytes contained
     in each input file, or standard input (if no file is specified) to the
     standard output.  A line is defined as a string of characters delimited
     by a ⟨newline⟩ character.  Characters beyond the final ⟨newline⟩
     character will not be included in the line count.

     A word is defined as a string of characters delimited by white space
     characters.  White space characters are the set of characters for which
     the iswspace(3) function returns true.  If more than one input file is
     specified, a line of cumulative counts for all the files is displayed on
     a separate line after the output for the last file. ...
In [ ]:
%%bash
echo Hello World | wc  

Simple text manipulation¶

cat    ← print file to stdout  
less   ← view file contents one screen at a time  
head   ← show first 10 lines  
tail   ← show last 10 lines  
wc     ← count lines/words/characters
sort   ← sort file by line and print out (`-n` for numerical sort)
uniq   ← remove adjacent duplicates (`-c` to count occurances)
cut    ← extract fixed width columns from file

A simple text demonstration¶

In [ ]:
!echo "a\nb\na\nb\nb" > test.txt
!cat test.txt
In [ ]:
!cat test.txt | sort
In [ ]:
!cat test.txt | sort | uniq
In [ ]:
! cat test.txt | sort | uniq | wc  

Advanced text manipulation¶

grep   ← search contents of file for expression
sed    ← stream editor - perform substitutions
awk    ← pattern scanning and processing, great for dealing with data in columns

grep¶

Search file(s) contents for a pattern

grep pattern file(s)

  • ‐r recursive search
  • ‐I skip over binary files
  • ‐s suppress error messages
  • ‐n show line numbers
  • ‐A N show N lines after match
  • ‐B N show N lines before match
In [ ]:
!grep a test.txt

grep patterns¶

Patterns are defined using regular expressions. Some useful special characters.

  • ^pattern pattern must be at start of line
  • pattern$ pattern must be at end of line
  • . match any character, not period
  • .* match any charcter repeated any number of times
  • \. escape a special character to treat it literally (i.e., this matches period)

awk¶

Pattern scanning and processing language. We'll use it to extract columns/fields. It processes a file line-by-line and if a condition holds runs a simple program on the line.

awk 'optional condition {awk program}' file

  • -Fx make x the field delimiter (default whitespace)
  • NF number of fields on current line
  • NR current record number
  • $0 full line
  • $N Nth field

awk examples¶

In [ ]:
!echo 'id last,first\n1 Smith,Alice\n2 Jones,Bob\n3 Smith,Charlie' > names
!cat names
In [ ]:
!awk '{print $1}' names
In [ ]:
!awk -F, '{print $2}' names
In [ ]:
!awk 'NR > 1 {print $2}' names
In [ ]:
!awk '$1 > 1 {print $0}' names

Activity¶

Download the Spellman.csv file from http://mscbio2025-2025.github.io/files/Spellman.csv, which gene expression levels over time

Use command line tools to answer these questions:

  • How many data points are in Spellman.csv?
  • The first three letters of the systematic open reading frames are: 'Y' for yeast, the chromosome number, then the chromosome arm. In the dataset, how many ORFs from chromosome A are there?
  • How many are there from each chromosome?
    • And from each chromosome arm?
  • How many data points start with a positive expression value?
  • What are the 10 data points with the highest initial expression values?
    • What about the lowest initial expression values?
  • How many lines are there where expression values are continuously increasing for the first 3 time steps?
  • Sorted by biggest increase?
wc Spellman.csv   (gives number of lines, because of header this is off by one)
grep YA Spellman.csv | wc
grep ^YA Spellman.csv | wc  (this is a bit better, ^ matches begining of line)
grep ^YA -c Spellman.csv  (grep can provide the count itself)
awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-2 | sort | uniq -c
awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-3 | sort | uniq -c
awk -F, 'NR > 1 && $2 > 0 {print $0}' Spellman.csv | wc
awk -F, 'NR > 1  {print $1,$2}' Spellman.csv  | sort -k2,2 -n | tail
awk -F, 'NR > 1  {print $1,$2}' Spellman.csv  | sort -k2,2 -n -r | tail
awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $0}' Spellman.csv  |wc
awk -F, 'NR > 1 && $3 > $2 && $4 > $3  {print $4-$2,$0}' Spellman.csv   | sort -n -k1,1

Next time¶

Getting started with Python

Running Python¶

$ cat hi.py 
print("hi")
$ python3 hi.py
hi
$ cat hi.py 
#!/usr/bin/python3
print("hi")
$ chmod +x hi.py  # make the file executable
$ ls -l hi.py 
-rwxr-xr-x  1 jpb156  staff  29 Sep  3 16:05 hi.py
$ ./hi.py 
hi

Python versions¶

python2 Legacy python.

python3 Released in 2008. Mostly the same as python2 but "cleaned up". Breaks backwards compatibility. May need to specify explicity (python3). We will be using python3.

https://wiki.python.org/moin/Python2orPython3

$ python
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

IPython¶

A powerful interactive shell¶

  • Tab complete commands, file names
  • Support for a number of "shell" commands (ls, cd, pwd, etc)
  • Supports up arrow, Ctrl-R
  • Persistent command history across sessions
  • Backbone of notebooks...
$ ipython
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 

ipython notebook¶

$ ipython notebook
$ jupyter notebook

Now called Jupyter (not just for python) jupyter.org

IPython in your browser. Save your code and your output.

Colab is basically a Google hosted Jupyter notebook.

Demo: running code (shift-enter), cell types, saving and exporting, kernel state

Why Jupyter notebook?¶

  • A "lab notebook" for data science
  • See output as you run commands
  • Embedded figures/output
  • Easy to modify and rerun steps
  • Can embed formatted text - share code and reason for code
  • Can convert to multiple formats (html, pdf, raw python, even slides)

A different perspective