from IPython.core.display import HTML
def _set_css_style(css_file_path):
"""
Read the custom CSS file and load it into Jupyter.
Pass the file path to the CSS file.
"""
styles = open(css_file_path, "r").read()
s = '<style>%s</style>' % styles
return HTML(s)
_set_css_style('rise.css')
Scripts and text processing with the Linux command line¶
- Environment variables
- Command line control structures
- Input/output control
- Simple text processing
Reviewing shell commands¶
Path commands
ls ← list files
cd ← change directory
pwd ← print working (current) directory
.. ← referral to parent directory
. ← referral to current directory
File manipulation
cp ← copy
mv ← move
rm ← remove (delete)
Environment variables¶
Variables are also stored in terminal sessions
NAME=value
sets NAME
equal to value
, with no spaces around =
export NAME=value
sets NAME
equal to value
and make it stick in future sessions
$
dereference (get the value of) the variable
%%bash
X=3
echo $X
%%bash
X=hello
echo $X
%%bash
echo X
More complex variables¶
Commands can also be set as variables with backticks `cmd`
%%bash
X=`ls *.css`
echo $X
%%bash
X=3
echo $X
echo ${X}
echo '$X'
echo \"$X\"
Some common special characters¶
$ ← dereference variable
* ← wildcard (see also ? and [...] for more restrictions)
\ ← escape character
... and more examples soon
Command line control structures¶
Bash can run simple loops, if/then statements, etc.
%%bash
for i in x y z
do
echo $i
done
%%bash
for file in *.css
do
echo $file
done
Nested control structures¶
%%bash
for i in {1..10}
do
if [ $i -gt 5 ]; then
echo $i
fi
done
Note: in bash, >
, <
, etc. are string
comparators -- use -gt
, -lt
, etc. instead
Input/output redirection¶
> ← send standard output to file
>> ← append standard output to file
< ← send file to standard input of command
2> ← send standard error to file
&> ← send output and error to file
Example -- what prints out?¶
cat
reads the contents of a file
%%bash
echo Hello > h.txt
echo World >> h.txt
cat h.txt
Pipes to chain commands¶
A pipe (|
) redirects the standard output of one program to the standard input of another. It's like you typed the output of the first program into the second. This allows us to chain simple programs together to do something more complicated.
WC(1) General Commands Manual WC(1)
NAME
wc – word, line, character, and byte count
SYNOPSIS
wc [--libxo] [-Lclmw] [file ...]
DESCRIPTION
The wc utility displays the number of lines, words, and bytes contained
in each input file, or standard input (if no file is specified) to the
standard output. A line is defined as a string of characters delimited
by a ⟨newline⟩ character. Characters beyond the final ⟨newline⟩
character will not be included in the line count.
A word is defined as a string of characters delimited by white space
characters. White space characters are the set of characters for which
the iswspace(3) function returns true. If more than one input file is
specified, a line of cumulative counts for all the files is displayed on
a separate line after the output for the last file. ...
%%bash
echo Hello World | wc
Simple text manipulation¶
cat ← print file to stdout
less ← view file contents one screen at a time
head ← show first 10 lines
tail ← show last 10 lines
wc ← count lines/words/characters
sort ← sort file by line and print out (`-n` for numerical sort)
uniq ← remove adjacent duplicates (`-c` to count occurances)
cut ← extract fixed width columns from file
A simple text demonstration¶
!echo "a\nb\na\nb\nb" > test.txt
!cat test.txt
!cat test.txt | sort
!cat test.txt | sort | uniq
! cat test.txt | sort | uniq | wc
Advanced text manipulation¶
grep ← search contents of file for expression
sed ← stream editor - perform substitutions
awk ← pattern scanning and processing, great for dealing with data in columns
grep
¶
Search file(s) contents for a pattern
grep pattern file(s)
‐r
recursive search‐I
skip over binary files‐s
suppress error messages‐n
show line numbers‐A
N show N lines after match‐B
N show N lines before match
!grep a test.txt
grep
patterns¶
Patterns are defined using regular expressions. Some useful special characters.
^pattern
pattern must be at start of linepattern$
pattern must be at end of line.
match any character, not period.*
match any charcter repeated any number of times\.
escape a special character to treat it literally (i.e., this matches period)
awk
¶
Pattern scanning and processing language. We'll use it to extract columns/fields. It processes a file line-by-line and if a condition holds runs a simple program on the line.
awk 'optional condition {awk program}' file
-Fx
make x the field delimiter (default whitespace)NF
number of fields on current lineNR
current record number$0
full line$N
Nth field
awk
examples¶
!echo 'id last,first\n1 Smith,Alice\n2 Jones,Bob\n3 Smith,Charlie' > names
!cat names
!awk '{print $1}' names
!awk -F, '{print $2}' names
!awk 'NR > 1 {print $2}' names
!awk '$1 > 1 {print $0}' names
Activity¶
Download the Spellman.csv
file from http://mscbio2025-2025.github.io/files/Spellman.csv, which gene expression levels over time
Use command line tools to answer these questions:
- How many data points are in Spellman.csv?
- The first three letters of the systematic open reading frames are: 'Y' for yeast, the chromosome number, then the chromosome arm. In the dataset, how many ORFs from chromosome A are there?
- How many are there from each chromosome?
- And from each chromosome arm?
- How many data points start with a positive expression value?
- What are the 10 data points with the highest initial expression values?
- What about the lowest initial expression values?
- How many lines are there where expression values are continuously increasing for the first 3 time steps?
- Sorted by biggest increase?
Next time¶
Getting started with Python
Running Python¶
$ cat hi.py
print("hi")
$ python3 hi.py
hi
$ cat hi.py
#!/usr/bin/python3
print("hi")
$ chmod +x hi.py # make the file executable
$ ls -l hi.py
-rwxr-xr-x 1 jpb156 staff 29 Sep 3 16:05 hi.py
$ ./hi.py
hi
Python versions¶
python2 Legacy python.
python3 Released in 2008. Mostly the same as python2 but "cleaned up". Breaks backwards compatibility. May need to specify explicity (python3
). We will be using python3.
https://wiki.python.org/moin/Python2orPython3
$ python
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
IPython¶
A powerful interactive shell¶
- Tab complete commands, file names
- Support for a number of "shell" commands (ls, cd, pwd, etc)
- Supports up arrow,
Ctrl-R
- Persistent command history across sessions
- Backbone of notebooks...
$ ipython
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
Now called Jupyter (not just for python) jupyter.org
IPython in your browser. Save your code and your output.
Colab is basically a Google hosted Jupyter notebook.
Demo: running code (shift-enter), cell types, saving and exporting, kernel state
Why Jupyter notebook?¶
- A "lab notebook" for data science
- See output as you run commands
- Embedded figures/output
- Easy to modify and rerun steps
- Can embed formatted text - share code and reason for code
- Can convert to multiple formats (html, pdf, raw python, even slides)