Tutorial 0: Python warm-up

Author: Alejandro Monroy

In this tutorial we will revise some of the Python programming skills that are necessary for the course. We will cover basic Python functionalities as well as NumPy, Pandas and Matplotlib.

❗️ Note: This tutorial is not meant to be a self-contained guide to learn the concepts from scratch, but a brief recap of the tools that you need to know to be able to understand the rest of the tutorials. If you feel like you need a deeper level of detail, we invite you to check out the additional resources at the end of each section.

1. Python basics

1.1. Primitive Data Types

Python supports the usual primitive data types:

Integers (int): Whole numbers, e.g., 1, 42, -7.
Floating-point numbers (float): Numbers with decimal points, e.g., 3.14, -0.001.
Strings (str): Sequences of characters, e.g., "hello", "Python".
Booleans (bool): Logical values representing True or False.

These primitive data types are essential for performing basic operations and storing simple values in Python.

[1]:

# Integer
x = 10
print("x = ", x)
print("Type of x: ", type(x))

# Float
y = 3.14
print("\ny = ", y)
print("Type of y: ", type(y))

# String
name = "Alice"
print("\nname = ", name)
print("Type of name: ", type(name))

# Boolean
condition_1 = x == 11
print("\ncondition_1 = ", condition_1)
print("Type of condition_1: ", type(condition_1))
condition_2 = x <= 11
print("condition_2 = ", condition_2)
print("Type of condition_2: ", type(condition_2))

x =  10
Type of x:  <class 'int'>

y =  3.14
Type of y:  <class 'float'>

name =  Alice
Type of name:  <class 'str'>

condition_1 =  False
Type of condition_1:  <class 'bool'>
condition_2 =  True
Type of condition_2:  <class 'bool'>

1.2. Mathematical Operations

Python supports various basic mathematical operations:

Basic Arithmetic: + (addition), - (subtraction), * (multiplication), / (division).
Exponentiation: ** (power), e.g., 2 ** 3 results in 8.
Modulus: % (remainder), e.g., 10 % 3 results in 1.
Floor Division: // (integer division), e.g., 10 // 3 results in 3.

1.3. Control Structures

Control structures dictate the flow of execution in a program:

Conditional Statements: if, elif, else for decision making.
Loops: for to iterate over sequences, while to repeat as long as a condition is true.

[2]:

# Example code to check if a number is divisible by 3, 5, or neither
number = 15

if number % 3 == 0 and number % 5 == 0:
    print("The number is divisible by both 3 and 5")
elif number % 3 == 0:
    print("The number is divisible by 3")
elif number % 5 == 0:
    print("The number is divisible by 5")
else:
    print("The number is not divisible by 3 or 5")

The number is divisible by both 3 and 5

[3]:

# Example code that prints all integers between 1 and 30 that are divisible by 2 and 3
for num in range(1, 30):
    if num % 2 == 0 and num % 3 == 0:
        print(num)

1.4. Collections

Python provides several built-in collection types to store and manage groups of related data. Each collection type has unique characteristics and use cases. The basic collection types are lists, tuples, sets, and dictionaries:

Mutable: Indicates whether the collection can be modified after creation. Mutable collections (like lists and dictionaries) allow adding, removing, or changing elements. Immutable collections (like tuples) do not allow any modifications after creation.
Ordered: Indicates whether the elements in the collection maintain a specific order. Ordered collections (like lists and tuples) preserve the order of elements as they were added. Unordered collections (like sets and dictionaries) do not guarantee any specific order.
Duplicate Elements: Indicates whether the collection allows duplicate elements. Collections that allow duplicates (like lists and tuples) can contain multiple instances of the same value. Collections that do not allow duplicates (like sets) ensure all elements are unique.

There are different ways to create collections. We will compare some of these ways in the following examples:

[4]:

# Example code to define a list with the first 10 powers of 10 in different ways

# By extension
l1 = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

# Using a for loop
l2 = []
for i in range(10):
    l2.append(2**i)

# Using list comprehension
l3 = [2**i for i in range(10)]

print(l1)
print(l2)
print(l3)

# Check if all lists are equal
assert l1 == l2 == l3

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]

[5]:

# Example code to define a dictionary where the keys are the integers from 0 to 9 and the values the corresponding power of 2

# By extension
d1 = {0: 1, 1: 2, 2: 4, 3: 8, 4: 16, 5: 32, 6: 64, 7: 128, 8: 256, 9: 512}

# Using a for loop
d2 = {i: 2**i for i in range(10)}
for i in range(10):
    d2[i] = 2**i

# Using dictionary comprehension
d3 = {i: 2**i for i in range(10)}

print(d1)
print(d2)
print(d3)

# Check if all dictionaries are equal
assert d1 == d2 == d3

{0: 1, 1: 2, 2: 4, 3: 8, 4: 16, 5: 32, 6: 64, 7: 128, 8: 256, 9: 512}
{0: 1, 1: 2, 2: 4, 3: 8, 4: 16, 5: 32, 6: 64, 7: 128, 8: 256, 9: 512}
{0: 1, 1: 2, 2: 4, 3: 8, 4: 16, 5: 32, 6: 64, 7: 128, 8: 256, 9: 512}

💡 Note: In the previous cells we used an assert statement. The assert statement is used for debugging purposes. It tests a condition, and if the condition is False, it raises an AssertionError with an optional error message. This helps in identifying and fixing bugs by ensuring that certain conditions hold true during the execution of the program. In this case, we include it to make sure/show that the three lists are indeed equal!

1.5. Functions

A Python function is a block of reusable code that performs a specific task. It can take inputs (called parameters), execute a series of statements, and return an output. Functions help in organizing code, making it more readable and reusable.

[6]:

# Example of Python function
def greet(name):
    """
    Greets a person by printing a personalized message including their name.

    Args:
        name (str): The name of the person to greet.

    Returns:
        str: Personalized greeting message.
    """
    return f"Hello, {name}! How are you today?"

# Example usage
greet("Alex")

[6]:

'Hello, Alex! How are you today?'

💡 Tip:If you are going to use some piece of code more than once, it is usually a good practice to encapsulate it into a function. This promotes code reusability, reduces redundancy, improves readability, and makes maintenance easier.

1.6. Classes

A Python class is a blueprint for creating objects. It defines a set of attributes and methods that the created objects will have. Classes allow for object-oriented programming, which helps in organizing code into reusable and related components.

[7]:

# Example of Python class
class Person:
    """
    A class used to represent a person with a name and age, and to greet them with a personalized message.
    """

    def __init__(self, name, age):
        """
        Initializes a new instance of the Person class.

        Args:
            name (str): The name of the person.
            age (int): The age of the person.
        """
        self.name = name
        self.age = age

    def birthday(self):
        """
        Increments the age of the person by 1.
        """
        self.age += 1

    def greet(self):
        """
        Greets the person by returning a personalized message including their name and age.

        Returns:
            str: Personalized greeting message.
        """
        return f"Hello, {self.name}! You are {self.age} years old. How are you today?"

# Example usage
alex = Person("Alex", 25)
print(alex.greet())
alex.birthday()
print(alex.greet())

Hello, Alex! You are 25 years old. How are you today?
Hello, Alex! You are 26 years old. How are you today?

💡 Tip: It is a good practice to add docstrings after the definition of a class, function or method. This will help other people (and yourself) understand what the code does and how to use it. There are several popular docstring styles in Python, here we are using the Google format.

2. Handling multi-dimensional data with Numpy

NumPy is a powerful library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays efficiently. NumPy is widely used in data science, machine learning, and scientific computing due to its ease of use and performance benefits.

2.1. Numpy arrays

The main building block of NumPy is the ndarray class, which stands for N-dimensional array:

[8]:

import numpy as np

# Example of numpy arrays
# 1D array
a = np.array([1, 2, 3, 4, 5, 6])
print(a)
# 2D array
b = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
print(b)

[1 2 3 4 5 6]
[[1. 2. 3.]
 [4. 5. 6.]]

These are some of the most important attributes of numpy arrays:

Attribute	Description
`ndarray.shape`	Shape (dimensions) of the array.
`ndarray.dtype`	Data type of the array elements.
`ndarray.ndim`	Number of dimensions of the array.
`ndarray.size`	Number of elements in the array.
`ndarray.T`	Transpose of the array.

[9]:

print("Array:\n", a)
print("Shape:", a.shape)
print("Number of dimensions: ", a.ndim)
print("Number of elements: ", a.size)
print("Data type: ", a.dtype)
print("Transpose:", a.T)

print("\nArray:\n", b)
print("Shape:", b.shape)
print("Number of dimensions: ", b.ndim)
print("Number of elements: ", b.size)
print("Data type:", b.dtype)
print("Transpose:", b.T)

Array:
 [1 2 3 4 5 6]
Shape: (6,)
Number of dimensions:  1
Number of elements:  6
Data type:  int64
Transpose: [1 2 3 4 5 6]

Array:
 [[1. 2. 3.]
 [4. 5. 6.]]
Shape: (2, 3)
Number of dimensions:  2
Number of elements:  6
Data type: float32
Transpose: [[1. 4.]
 [2. 5.]
 [3. 6.]]

💡 Tip: We are using a class (ndarray) that we imported from a library (numpy). To know how to use imported classes/functions/methods, we can check their documentation online (by just googling its name) or offline (for example, try to run a code cell with the line np.ndarray?, and you will see that the output contains information about the attributes that we just explained). Being able to access and understand documentation from libraries is crucial to become a good programmer!

2.2. Elemental operations between numpy arrays

Numpy arrays support elemental operations. These are some examples:

[10]:

### Operations between numpy arrays
u1 = np.array([1, 2, 3])
u2 = np.array([2, 4, 6])

print("Addition of u1 and u2:", u1 + u2)
print("Subtraction of u1 and u2:", u1 - u2)
print("Multiplication of u1 by a constant:", 5 * u1)
print("Element-wise multiplication of u1 and u2:", u1 * u2)
print("Element-wise division of u1 and u2:", u1 / u2)

Addition of u1 and u2: [3 6 9]
Subtraction of u1 and u2: [-1 -2 -3]
Multiplication of u1 by a constant: [ 5 10 15]
Element-wise multiplication of u1 and u2: [ 2  8 18]
Element-wise division of u1 and u2: [0.5 0.5 0.5]

Another commonly used vector operation is the dot product. Mathematically, the dot product (denoted with \(\cdot\)) of two vectors \(a = [a_1, a_2, ..., a_N]\), \(b = [b_1, b_2, ..., b_N]\) is

\[a \cdot b = a_1b_1 + a_2b_2 + ... + a_Nb_N.\]

Notice that: - The input is two vectors, but the output is a single scalar! - Both input vectors need to have the same length

[11]:

print("Dot product of u1 and u2:", np.dot(u1, u2))

Dot product of u1 and u2: 28

📝 Task for you:Apply the formula above by hand and check that the result we got is indeed correct!

We can also have higher-dimensional arrays, which can be used to represent matrices. The element-wise operations that we saw before also apply in this case. Now, we also perform matrix multiplication using the @ operator or np.matmul:

[12]:

m1 = np.array([[1, 2], [3, 4]])
m2 = np.array([[2, 4], [6, 8]])
print("Addition of m1 and m2:\n", m1 + m2)
print("Multiplication of m1 and m2:\n", np.matmul(m1, m2))
print("Multiplication of m1 and m2:\n", m1 @ m2)

Addition of m1 and m2:
 [[ 3  6]
 [ 9 12]]
Multiplication of m1 and m2:
 [[14 20]
 [30 44]]
Multiplication of m1 and m2:
 [[14 20]
 [30 44]]

2.3. Important methods of numpy arrays

Array Manipulation

Method	Description
`ndarray.reshape(newshape)`	Returns a new array with the same data but a new shape.
`ndarray.flatten()`	Returns a copy of the array collapsed into one dimension.
`ndarray.sort(axis=-1)`	Sorts the array along the specified axis.
`ndarray.concatenate((a1, a2, ...), axis=0)`	Joins a sequence of arrays along an existing axis.

[13]:

a = np.array([[1, 2, 3], [4, 5, 6]])
print("Array:\n ", a)
print("Shape: ", a.shape)

# The following method turns the array into a 1D array with the same elements
a_flattened = a.flatten()
print("\nFlattened array:\n", a_flattened)
print("Shape of flattened array: ", a_flattened.shape)

Array:
  [[1 2 3]
 [4 5 6]]
Shape:  (2, 3)

Flattened array:
 [1 2 3 4 5 6]
Shape of flattened array:  (6,)

💡 Tip: Again, check out the documentation for the numpy methods that we use! For example, try running np.ndarray.flatten? to see its docstrig and some usage examples.

Mathematical Operations

Method	Description
`ndarray.dot(b)`	Returns the dot product of the array and the one in the argument
`ndarray.sum(axis=None)`	Returns the sum of the array elements over the specified axis.
`ndarray.prod(axis=None)`	Returns the product of the array elements over the specified axis.
`ndarray.cumsum(axis=None)`	Returns the cumulative sum of the array elements over the specified axis.
`ndarray.cumprod(axis=None)`	Returns the cumulative product of the array elements over the specified axis.

[14]:

a = np.array([[2, 2, 2, 2], [3, 3, 3, 3]])
print("Array:\n ", a)

print("Sum of the array across columns: ", a.sum(axis=0)) # axis=0 means aggregate along the first dimension
print("Sum of the array across all dimensions: ", a.sum())
print("Product of the array across rows:; ", a.prod(axis=-1)) # axis=1 means aggregate along the second dimension

Array:
  [[2 2 2 2]
 [3 3 3 3]]
Sum of the array across columns:  [5 5 5 5]
Sum of the array across all dimensions:  20
Product of the array across rows:;  [16 81]

🤔 Food for thought: Why did we use axis = -1 in the last example? What would be an a value for the axis in in this case would give the same answer?

Statistical Methods

Method	Description
`ndarray.mean(axis=None)`	Returns the mean of the array elements over the specified axis.
`ndarray.std(axis=None)`	Returns the standard deviation of the array elements over the specified axis.
`ndarray.var(axis=None)`	Returns the variance of the array elements over the specified axis.
`ndarray.min(axis=None)`	Returns the minimum value of the array elements over the specified axis.
`ndarray.max(axis=None)`	Returns the maximum value of the array elements over the specified axis.
`ndarray.argmin(axis=None)`	Returns the indices of the minimum values along an axis.
`ndarray.argmax(axis=None)`	Returns the indices of the maximum values along an axis.

[15]:

a = np.array([[1, 3, 2, 0, 2], [10, 5, 5, 2, 6]])
print("Array:\n ", a)
print("Minimum value across columns: ", a.min(axis=-1))
print("Mean across all dimensions: ", a.mean())

Array:
  [[ 1  3  2  0  2]
 [10  5  5  2  6]]
Minimum value across columns:  [0 2]
Mean across all dimensions:  3.6

🚀 Further reading: We invite you to check out this basics guide and quick start guide from the official NumPy documentation for a more extensive guide with more examples.

3. Handling multi-dimensional data (with more options) with Pandas

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python.

The main building block of the library is the pandas DataFrame. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a database or an Excel spreadsheet. You can create a DataFrame from various data sources, including lists, dictionaries, and NumPy arrays or import it from a file.

[16]:

import pandas as pd

# Create a dictionary with the sample data
data = {
    "Name": ["Arnoud", "Alex", "Charlie", "Tycho", "Ana", "Frank", "Paco", "Sara"],
    "Gender": ["F", "M", "M", "M", "F", "M", "F", "M"],
    "Age": [45, 25, 35, 22, 32, 40, 27, 33],
    "Nationality": ["Netherlands", "Spain", "UK", "Netherlands", "Spain", "UK", "Spain", "Spain"]
}

# Create the dataframe
df = pd.DataFrame(data)

# Display the dataframe
df

[16]:

	Name	Gender	Age	Nationality
0	Arnoud	F	45	Netherlands
1	Alex	M	25	Spain
2	Charlie	M	35	UK
3	Tycho	M	22	Netherlands
4	Ana	F	32	Spain
5	Frank	M	40	UK
6	Paco	F	27	Spain
7	Sara	M	33	Spain

❗️ Note: Usually, we will not create a dataframe from a hardcoded collection, but we will import it from a library or from a file using functions such as pd.read_csv() or pd.read_excel().

These are some of the most important attributes of a DataFrame:

[17]:

print("Shape of the DataFrame:", df.shape)
print("Number of rows:", len(df))
print("Number of columns:", len(df.columns))
print("Column names:", df.columns.tolist())
print("Data types of columns:\n", df.dtypes)
print("Summary statistics:\n", df.describe())

Shape of the DataFrame: (8, 4)
Number of rows: 8
Number of columns: 4
Column names: ['Name', 'Gender', 'Age', 'Nationality']
Data types of columns:
 Name           object
Gender         object
Age             int64
Nationality    object
dtype: object
Summary statistics:
              Age
count   8.000000
mean   32.375000
std     7.707835
min    22.000000
25%    26.500000
50%    32.500000
75%    36.250000
max    45.000000

We can access the columns of the dataframe with df[<column_name>] or df.<column_name>. This will return a pandas Series:

[18]:

print(df["Age"])
print(df.Age)
print(type(df["Age"]))

0    45
1    25
2    35
3    22
4    32
5    40
6    27
7    33
Name: Age, dtype: int64
0    45
1    25
2    35
3    22
4    32
5    40
6    27
7    33
Name: Age, dtype: int64
<class 'pandas.core.series.Series'>

Remember that Pandas is built on top of NumPy! We can convert a Series object to a np.ndarray by just accessing the Series.values attribute:

[19]:

print(df.Age.values)
print(type(df.Age.values))

[45 25 35 22 32 40 27 33]
<class 'numpy.ndarray'>

We can use the Numpy methods we saw before on objects of the Series class (apart from some others that are only implemented on Pandas, such as .mode()):

[20]:

print("Mean of Age column: ", df.Age.mean())
print("Mode of Nationality column: ", df.Nationality.mode().values[0])

Mean of Age column:  32.375
Mode of Nationality column:  Spain

🚀 Further reading: We invite you to check out this quick start guide from the official Pandas documentation for a more extensive tutorial on Pandas.

4. Making awesome plots with Matplotlib

Matplotlib is a popular Python library for creating plots. It”s commonly used for generating visualizations like line plots, bar charts, scatter plots, and histograms. These are the most common types of plots:

Plot Type	Description	Function
Line Plot	Displays data points connected by straight lines.	`plt.plot()`
Bar Chart	Represents categorical data with rectangular bars.	`plt.bar()`
Histogram	Shows the distribution of a dataset.	`plt.hist()`
Scatter Plot	Displays the relationship between two variables.	`plt.scatter()`
Pie Chart	Represents data as slices of a pie.	`plt.pie()`
Box Plot	Summarizes data using quartiles.	`plt.boxplot()`

Let”s display a line plot with two functions as an example. We will use other types of plots in the following tutorials.

[21]:

import matplotlib.pyplot as plt

# Generate x values from 0 to 10
x = np.linspace(-2, 2, 100)

# Calculate y values for x^2 and log(x)
y1 = x
y2 = x**2

# Plot the functions
plt.plot(x, y1, label="$f(x) = x$")
plt.plot(x, y2, label="$f(x) = x^2$")

# Add labels and legend
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()

# Show the plot
plt.show()

❗️ Note: Make sure your plots contain the scale and label in each axes! If you are plotting multiple data in the same plot, make sure you include a legend too.

🚀 Further reading: We invite you to check out this quick start guide from the official Matplotlib documentation for more information about figures structure, layout and presentation. Eventually we might also want to use some functions from the seaborn library for fancier plots. You can check out this introductory guide.