# Python
*Notes by Mary Richardson (2021) with modifications by Jakob Heinz (2024)*

If you're new to Python or coding as a whole, have no fear! Soon you'll be a pro. If you can get comfortable with everything in this notebook, you'll have all the tools you need to tackle the first pset. Let's start with the essentials. 

* [Variables and Output](#Variables-and-Output)
    * [Comments](#Comments)
    * [Typecasting](#Typecasting)
    * [Print Statements](#Print-Statements)
* [Types](#Data-Types)
    * [Numbers](#Numbers)
    * [Strings](#Strings)
    * [Booleans](#Booleans)
    * [Lists](#Lists)
    * [Dictionaries](#Dictionaries)
* [Control Flow](#Control-Flow)
    * [Conditional Statements](#Conditional-Statements)
    * [While Loops](#While-Loops)
    * [For Loops](#For-Loops)
* [File Handling](#File-Handling)
    * [Writing](#Writing)
    * [Reading](#Reading)
* [Functions](#Functions)
* [Get Coding!](#Get-Coding)
    * [Troubleshooting](#Troubleshooting)

## Variables and Output

We can easily store a value as a variable in Python. To do this we use a single `=`, with the variable name always to the left, and the assigned variable always to the right. 

There are several important **data types and structures** for storing values in Python you should be familiar with:
- **int**: integer number
- **float**: decimal number
- **str**: string/text
- **bool**: boolean (true or false)
- **list**: list of values
- **dict**: dictionary of keys and associated variables

In [1]:
# Define an integer


# Define a float


# Define a string


# Define a boolean


# Define a list


# Define a dictionary


### Comments

It's good practice to write comments to explain your code and make it more readable. Even if something seems incredibly obvious to you, comments will help make it clear to anyone else who tries to run your code... and to your future self!

Everything to the right of a `#` symbol is ignored by the Python interpreter



### Typecasting

Python uses domething called *soft typing*, which means it infers the type of each variable. We don't explicitly set the type of variables. But sometimes it infers incorrectly or we need to switch between types. 

We can check the type of a variable with the `type()` function. We can change the type by using the name of the new type as a function (e.g. `int()`, `float()`, `str()`).

In [2]:
#cast int to string


In [3]:
# Other data types can also be converted, but be careful when you cast numbers!


In [4]:
# When casting to a boolean, any nonzero value will be True


In [5]:
# A str value can't be used to do any math until it is converted into a number


### Print Statements

Print statements are a useful way to display your results and also to check the values in your code when debugging. There are several ways to print values.

In [6]:
# To print strings, we have several equivalent options (pick your favorite!)
x = "hello"
y = "world"

print(x + " " + y)
print(' '.join([x,y]))

print("%s %s" % (x, y))        # Specify %s as a placeholder for a string value

print("{} {}".format(x, y))    # Alternatively use {} as a placeholder for a value

print(f"{x} {y}")

hello world
hello world
hello world
hello world
hello world


In [7]:
# To print numbers, we first have to convert them to strings or use one of the other print methods




In [8]:
# new lines and tabs
# \n or \t


## Types

Now let's get into some more details about the data types we'll be using and how to manipulate them.

### Numbers

Throughout your journey with Python, you'll make use of at least two types of numbers: floating points and integers. At this point, all you need to know is that a floating point number (or `float`) is a number with a decimal point (e.g. 12.34546789) and an integer (or `int`) is a number without a decimal point (i.e. 0, negative and non-negative whole numbers). 

Python supports standard arithmetic operations, follows the usual order of operations, and supports parentheses.

Operator | Operation
--- | ---
`+` |	add
`-` |	subtract
`*` | multiply
`/` | divide
`%` | modulus
`**` | exponentiate

In [9]:
# int
# float

In [10]:
# Specify order of operations with parentheses


In [11]:
# Modulus (%) calculates the remainder


In [12]:
# To increment an int, you can use this shorthand
# Equivalent to x = x + 1


### Strings

A string (or `str`) is the name for a variable that holds text, not numbers. In Python, strings are created by surrounding text with double or single quotes. Importantly, if you see a number in a string, that number cannot be used as a number until it is **typecast** (or converted) into a number.

Python has many useful string operations.

Operation | Result
--- | :---: 
`+` | concatenate strings
`*` | repeat the same string
`x in s`| True if substring x is in s, else False
`x not in s` | False substring x is in s, else True
`s[i:j:k]`| slice of s from i to j with step k
`len(s)`| length of s

In [13]:
x = 'this'
y = '9.2'
z = " is a string"

In [14]:
# Append two strings


In [15]:
# Count the number of characters in a string


In [16]:
# To access an individual character, use its position in the string (starting from 0)


In [17]:
# To access a range of characters, specify a slice range (starting from 0)

# This range always includes the start index but excludes end index, i.e. [0,3)

In [18]:
# To separate a string into a list of strings, use split
# and specify the delimiter (separator)
    # Default is to split on whitespace
    # We can specify alternate delimiters

### Booleans

A boolean (or `bool`), named after [George Boole](https://en.wikipedia.org/wiki/George_Boole), is data type that can be either **True** or **False**. Booleans will become important for making comparisons and controlling the flow of your code.

Python allows basic logic comparisons through a set of relational operators.

Operator | Operation
--- | ---
`==` | equal
`!=` | not equal
`<` | less than
`<=` | less than or equal to
`>` | greater than
`>=` | greater than or equal to

**Note that `=` is used for assignment, while `==` is used to test for equality.**

### Lists

A `list` is a convenient data structure for organizing variables. Lists group data of any type into a structure that can be sampled via **indexing**, where an item or items in a list are referred to by their position in the list.


Python has a ton of useful built in functions for operating on lists.

Operation | Result
--- | :---: 
`x in s` | True if an item of s is equal to x, else False
`x not in s` | False if an item of s is equal to x, else True
`s[i:j:k]`| slice of s from i to j with step k
`len(s)`| length of s
`min(s)`| smallest item of s
`max(s)`| largest item of s
`sum(s)`| sum of the items in s
`mean(s)`| mean value of the items in s
`s.index(x)`| index of the first occurrence of x in s
`s.count(x)`| total number of occurrences of x in s
`sorted(s)`| for a list of numbers, returns a sorted list in ascending order
`s.append(x)` | adds item x to list s

Note that Python uses a 0-based counting system. This means that in a list, the initial index is set to 0, not 1. For exmample, let's say I have this list \[5, 3, 2, 8, 10\]. You might say that the first item in the list is five – the value at index 1 is five. In Python though, you would say that the value at index 0 is five and the value at index 1 is three.


In [19]:
# We can mix data types in lists, and even have lists within lists!


### Dictionaries

A dictionary (or `dict`) is a special data structure that allows you to associate **values** with a specific **keys**. The data structure takes its name from a dictionary, because a word dictionary is organized similarly: take a word, look it up in the dictionary, see some associated description. In a Python dict, you take a word (the key), look it up in the dict, and see some list of numbers or strings associated with that word (the values).


Python has several additional useful functions for dicts.

Operation | Result
--- | :---:
`s.clear()` | erases the dict
`s.values()` | returns a list with all the assigned values in the dictionary
`s.keys()` | returns a list with all the keys in the dictionary
`s.items()` | returns a list containing of (key, value) for each key in the dictionary

In [20]:
{'apple': 'red','banana': 'yellow','orange': 'orange'}

{'apple': 'red', 'banana': 'yellow', 'orange': 'orange'}

In [21]:
{'apple': 5,'banana': 3,'orange': 7}

{'apple': 5, 'banana': 3, 'orange': 7}

In [22]:
#Filling an empty dictionary

a = {}

#Values can be added to the dictionary in the form a[Key] = Value


In [23]:
# Get just the keys of the dict


In [24]:
# Get just the values of the dict


In [25]:
# Get the key, value pairs of the dict


## Control Flow
Now that you've got a handle on variables, we're ready to learn how to connect individual lines of code. So far we've just run Python commands sequentially – every line runs, one after another, from top to bottom. But sometimes we need to skip chunks of code under certain conditions, or run different chunks of code depending on some condition. And sometimes we want to run the same chunk of code multiple times. There are several important ways to accomplish this.

### Conditional Statements

Conditional statements allow you to write code that only runs when certain conditions are met. The most common conditional structures are `if`, `elif` (else if), and `else` statements. 

An `if` statement can have as many `elif` statements as you want. Each `if`/`elif` statement is evaluated in sequence, and only the code for the first condition that evaluates to True is run. Then Python jumps to the end of the conditional statement, and continues through your script line by line as usual.

We can use `else` to specify what to do if *none* of the `if` or `elif` statements evaluate to True. An `if` statement can only have one `else` statement, and it must come last.

Moving forward, note that Python is sensitive to indentation. The indentation level of your code tells python something about how to run it. I find the easiest way to ensure correct indentation is to use <kbd>tab</kbd> (rather than spaces) to indent your code at each level or <kbd>shift</kbd> + <kbd>tab</kbd> to unindent your code so that the indentation levels stay consistent.

Other formatting conventions, such as adding ':' at the end of a conditional statement, are also necessary.

In [27]:
if True:
    print('correct')
  print('incorrect')

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 3)

In [None]:
#Calculate x/y below. If y is zero, print an error message: 
# Define x and y
x = 4
y = 2 # Try changing the value of y to test your method

# FIXME!
    

### While Loops

Often in this course you will want to run code over and over until some condition is met. Loops allow you to perform a task repeatedly with **iteration**, meaning after each task repetition, something changes. In a `while` loop, the loop runs until the specified condition is true.

It's easy to make while loops that run forever (an infinite loop). For example if we forgot to increment our variable i above, the loop would never stop. This is something to watch out for.

We can also use `break` statements to end while loops when a condition is met.

### For Loops

Instead of iterating until a condition is met, you might want to iterate through a set list of values. In a `for` loop, the loop addresses each item in a group, in sequence, until the last item in the group. We can iterate through an existing list or we can use the `range` function to specify numeric values.

In [None]:
fruit_list = ['apple', 'banana', 'orange', 'plum']




In [None]:
#iterate through a list with index calls


## File Handling

Frequently, we will need to read in data from a file or write results to a file. To access a file, we first need to open it. It's good practice when you open a file to specify what type of access you will need: read (`r`), write (`w`), or append (`a`) are common modes. Use read if you only need to read from the file and don't want to edit it, write if you want to write (or overwrite) to the file, or append if you want to add to an existing file.

### Writing

In [None]:
# Write to a file
data = ['header',
        'line 1',
        'line 2',
        'line 3']

with open('test.txt', 'w') as outfile:     # Open a file named 'test.txt' for writing ('w')
    for line in data:                      # For each item in the list data
        outfile.write(line + '\n')         # Write the item to the file with a newline character in between (\n)

### Reading

In [None]:
# Read from a file
with open('test.txt', 'r') as infile:      # Open a file named 'test.txt' for reading ('r')
    next(infile)                           # Skip the first line (header line, for example)
    for line in infile:                    # Iterate through each remaining line in the file
        print(line.strip())                # Print the line stripped of the trailing newline character (.strip())

In [None]:
#or 


## Functions

Functions allow us to define routines that we will need to run repeatedly. Functions take some inputs (**arguments**) and **return** some outputs. In programming lingo, functions are **called** when you use them, and functions are **passed** arguments:

`return1, return2 = function(argument1, argument2)`


Note that in the example below where we call the function, our variables (`fruit` and `fruit_counts`) that we're going to input into the function and the function arguments have the same names. This does not have to be the case! 

If we say `my_function(any_variable_name, any_variable_name_2)`, the function will still interpret these inputs as `fruit` and `fruit_counts`, based on the order they're inputted. This is what allows functions to be generalizable to any inputs. 

Similarly, we can name our output anything when we call the function (not just `counts`), and python will assign it the same value as the count output generated by the function.

In [None]:
# Call the function
fruit_counts = {'apple': 5,'banana': 3,'orange': 7}
fruit = 'apple'



#### Exercise 20.

Write a function to calculate the quadratic formula given a, b, and c. (Hint: Use math.sqrt() for the square root function.)

In [None]:
import math # Import the math package to calculate the square root
def quadratic_formula(a, b, c):
    '''
    Given: a (int), b (int), c(int)
    Return: x as calculated by the quadratic formula
    '''
    #FIXME!
    return x

# NumPy 
**NumPy** (Numerical Python) is an incredibly powerful package for computation. It's the standard for working with numerical data in Python, and makes mathematical operations on arrays and matrices much more efficient.
### Import Statements
These packages contain specific functions that are useful to us. The `import` command allows you to access installed Python packages. `import` only needs to be used once per package, at the beginning of a Python session or Jupyter notebook. There are a few ways to use import:

In [None]:
# Import the entire package
import numpy

# Import the entire packagae but give it a shorter, easy-to-type name (alias)
import numpy as np

# Import only a specific part of the package
from numpy import random

# Import only a specific part of the package and give it a shorter, easy-to-type name (alias)
from numpy import random as rd

### Arrays

Numpy comes with an important central data type called an **array**. It's easy to get confused between Python lists and NumPy arrays, but they are distinct data types. You can convert a list of a single data type (a list of floats or a list of ints) to an array, and you can convert an array to a list. But functions and methods that work on arrays won't work on lists and vice versa. So it's important to keep track of what type you are using (remember the `type()` method!)

There are several ways to create an array, all of which can be useful in different contexts:
- `np.array()`: create an array from a list of values
- `np.zeros()`: create an array of zeros with the given dimensions
- `np.ones()`: create an array of ones with the given dimensions
- `np.arange()`: create an array with a range of evenly spaced values (provide the first number, last number, step size)
- `np.linspace()`: create an array with a range of linearly spaced values (provide the first number, last number, number of values) 

In [None]:
# Create a 1D numpy array


In [None]:
# Create a 3D numpy array


In [None]:
#array of zeros

In [None]:
#array of ones


In [None]:
#ranges


In [None]:
#linspace

### Axis

Since NumPy arrays can be 2D (or 3D or even higher dimensional), they have multiple **axes**. I *still* have to look up which axis is which when I'm trying to remember. (There's no shame in googling here!)
- `axis=0`: down a column
- `axis=1`: across a row
- We won't need higher dimensional arrays in MCB112, but you can continue incrementing the axis for higher dimensions.

Many functions in NumPy accept `axis` as an optional argument. For example:
- `np.sum()` sums over the whole array 
- `np.sum(axis=0)` sums down columns 
- `np.sum(axis=1)` sums across rows

### Basic Array Operations

NumPy supports standard elementwise operations:

Operator | Operation
--- | ---
`+` |	add
`-` |	subtract
`*` | multiply
`/` | divide

Note that `*` performs elementwise multiplication, not matrix multiplication. Instead, the `@` operator calculates the dot product of two matrices!

We can also use the equivalent methods for each operation:
- `np.add(a,b)`: a+b
- `np.subtract(a,b)`: a-b
- `np.multiply(a,b)`: a*b
- `np.divide(a,b)`: a/b
- `np.matmul(a,b)` or `np.dot(a,b)`: a@b

Some operations are performed universally on each element of the array:
- `np.sqrt(x)`: square root
- `np.log(x)`: natural log
- `np.log10(x)`: log base 10
- `np.exp(x)`: exponentiate
- `np.power(x,p)`: raise to the power p

Finally, we can perform operations as if the array were a list of numbers (or compute row-wise or column-wise values by specifying `axis`):
- `np.sum()`: sum of the whole array
- `np.mean()`: mean of the whole array
- `np.max()`: max value over the whole array
- `np.min()`: min value over the whole array

### Indexing, Slicing, and Filtering

You can index and slice NumPy arrays just like you can slice Python lists. But you can also filter the values, which is incredibly powerful! The function `np.nonzero()` is a handy way to select values that satisfy a certain condition in an array. 

Some other ways you might want to filter data include:
- `np.isnan()`: returns whether each element in an array is NaN
- `np.isinf()`: returns whether each element in an array is positive or negative infinity
- `np.isfinite()`: returns whether each element in an array is finite

### Generating random values
check out the documentation for [np.random](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html)
Here are some useful examples, but every case is unique and you want to consider how you randomly sample

In [None]:
rng = np.random.default_rng(42) 

#one random number 


#10 random values between 0 and 1 


#a binomial of n =1 is a bernoulli trial


#you can sample from nearly any distribution, check the documentation to see which are available 


#generate a random sequence: 



## Matplotlib

Now, to the fun part! Let's plot things. Most of these examples are straight out of the fabulous matplitlib tutorial.

Source: [Matplotlib Pyplot Tutorial](https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py)

In [None]:
import matplotlib.pyplot as plt

### Figures and Axes

To start, we need to create a figure. We can do this in several ways:
- `fig = plt.figure()`: an empty figure with no Axes
- `fig, ax = plt.subplots()`: a figure with a single Axes
- `fig, axs = plt.subplots(2, 2)`: a figure with a 2x2 grid of Axes

I'd recommend sticking with options 2 and 3 because defining new axes for each figure avoids conflicts between multiple figures in your notebook. Otherwise you could end up plotting things on the wrong figure!

### Labels, Titles, and Legends

These plots are meaningless, but this is *horrible* practice. We have no axes labels, no title, and no legend. So let's learn how to add those.

If you take one thing away from this section, let it be this: **ALWAYS label your axes!**

### Plot Types

In general here are a few types of plots you might want to generate in matplotlib:
- `plot()` generates a [line plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)
- `scatter()` generates a [scatter plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html)
- `bar()` generates a [bar plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html?highlight=bar%20plot#matplotlib.pyplot.bar)
- `hist()` generates a [histogram](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist)

In [None]:
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]

plt.figure(figsize=(9, 3))

plt.subplot(131)
plt.bar(names, values)
plt.subplot(132)
plt.scatter(names, values)
plt.subplot(133)
plt.plot(names, values)
plt.suptitle('Categorical Plotting')
plt.show()

### Formatting

This is a whole can of worms. You can get lost for hours on Stack Overflow pages troubleshooting how to customize your plot formatting in matplotlib.

For now, here are some basic options for line plots you may want to play with:
- `linewidth` sets the line thickness
- `linestyle` sets the style (`-` solid, `--` dashed, `:` dotted)
- `color` sets the color
- `alpha` sets the transparency

### Troubleshooting

What's the best way to troubleshoot when you're stuck?

- **Python Documentation**: Python has terrific online documentation for all of its functions. Python packages, which we'll talk about next week, also (usually) have their own documentation pages. These pages include descriptions and examples.
    - [Python Docs](https://docs.python.org/3.9/)


- **Google**: Many coding questions have already been answered online and can be found with a quick search. A popular site to troubleshoot coding questions is Stack Overflow. You should not just copy/paste code (this usually will not work for your specific problem anyway). But sites like Stack Overflow are helpful for figuring out little quirks in Python functions and also for seeing ways that other people have solved different problems.
    - [Stack Overflow](https://stackoverflow.com/)


- **Class Resources**: These include the Piazza page for directly asking questions, as well as lecture, section, and office hours. Working together with classmates can also be super helpful. You are expected to submit your own work for each pset (your code and text should not look like your friend's submission), but we encourage you to talk to each other about ideas and help each other figure out problems along the way. That's part of science!
    - [Ed](https://edstem.org/us/courses/57540/discussion/5217284)
    
- **LLMs**: Can be a useful tool for debugging. There is no guarantee it will be correct! It can also explain code to you and be used as a learning tool. In the end, you must figure out how much to rely on it, and for this class you should understand every line of code you turn in! 
    - [ChatGPT](https://chatgpt.com/)