We use comments to leave notes about the code to the reader. Comments are not actually run by Python, they are just there to help us read the code.
We can make multiline comments with """ and single line comments with #.
"""
A multi-line comment describes your code
to someone who is reading it.
"""
Example:
"""
This program will ask the user for two numbers.
Then it will add the numbers and print the final value.
"""
number_one = int(input("Enter a number: "))
number_two = int(input("Enter a second number: "))
print("Sum: " + str(number_one + number_two))
# Use single line comments to clarify parts of code.
Example:
# This program adds 1 and 2
added = 1 + 2
print(added)
Variables
We use variables to store values that can be used to control commands in our code. We can also alter these values throughout the code.
# Make a variable to store text
name = "Zach"
# Create variables that are numbers
num_one = 3
num_two = 4
sum = num_one + num_two
# We can also assign multiple variables at once
num_one, num_two = 3, 4
# The value of a variable can be changed after it has been
# created
num_one = num_one + 1
Printing
We can print elements to the screen by using the print command. If we want to print text, we need to surround the text with quotation marks " ".
print("Hello world")
print(2 + 2)
print(10)
Casting as a String
To print integers or floats together with strings, the integer or float must be cast as a string using the str() function. The strings are concatenated with a plus symbol.
print("The mean is " + str(my_list.mean()) + " .")
Mathematical Operators
Use mathematical operators to alter values.
+ Addition
- Subtraction
* Multiplication
/ Division
% Modulus (Remainder)
() Parentheses (For order of operations)
# Examples
z = x + y
w = x * y
# Division
a = 5.0 / 2 # Returns 2.5
b = 5.0 // 2 # Returns 2.0
c = 5/2 # Returns 2.5
d = 5 // 2 # Returns 2
# Increment (add one)
x += 1
# Decrement (subtract one)
x -= 1
# Absolute value
absolute_value = abs(x)
abs_val = abs(-5) # Returns 5
# Square root
import math
square_root = math.sqrt(x)
# Raising to a power
power = math.pow(x, y) # Calculates x^y
# Rounding
rounded_num = round(2.675, 2) # Returns 2.68
Random Numbers
To be able to use the randint or choice functions, you must use import random at the beginning of your code.
# Random integer between (and including) low and high
import random
random_num = random.randint(low, high)
random_element = random.choice(string)
# Example:
# Returns random number within and including 0 and 10.
random_num = random.randint(0,10)
# Random element in a string
random_element = random.choice('abcdefghij')
Comparison Operators
Use comparison operators to compare elements in order to make decisions in your code. Comparison operators return booleans (True/False).
x == y # is x equal to y
x != y # is x not equal to y
x > y # is x greater than y
x >= y # is x greater than or equal to y
x < y # is x less than y
x <= y # is x less than or equal to y
# Comparison operators in if statements
if x == y:
print("x and y are equal")
if x > 5:
print("x is greater than 5.")
Logical Operators
Use logical operators to check multiple conditions at once or one condition out of multiple.
# And Operator
and_expression = x and y
# Or Operator
or_expression = x or y
# You can combine many booleans!
boolean_expression = x and (y or z)
Functions
Writing a function is like teaching the computer a new word.
Naming Functions: You can name your functions whatever you want, but you can't have spaces in the function name. Instead of spaces, use underscores ( _ ) like_this_for_example
Make sure that all the code inside your function is indented one level!
Defining a Function
We define a function to teach the computer the instructions for a new word. We need to use the term def to tell the computer we’re creating a function.
def name_of_your_function():
# Code that will run when you make a call to
# this function.
# Example:
# Teach the computer to add two numbers
num_one = 1
num_two = 2
def add_numbers():
sum = num_one + num_two
Returning Values in Functions
We can use the command return to have a function give a value back to the code that called it. Without the return command, we could not use any altered values that were determined by the function.
# We add a return statement in order to use the value of the
# sum variable
num_one = 1
num_two = 2
def add_numbers():
sum = num_one + num_two
return sum
Calling a Function
We call a function to tell the computer to actually carry out the new command.
# Call the add_numbers() function once
# The computer will return a value of 3
add_numbers()
# Call the add_numbers() function 3 times and print the output
# The output will be the number 3 printed on 3 separate lines
print(add_numbers())
print(add_numbers())
print(add_numbers())
Using Parameters in Functions
We can use parameters to alter certain commands in our function. We have to include arguments for the parameters in our function call.
# In this program, parameters are used to give two numbers
def add_numbers(num_one, num_two):
sum = num_one + num_two
return sum
# We call the function with values inside the parentheses
# This program will print ‘7’
print(add_numbers(3, 4))
# If we have a list with the same number of parameters, we
# can use the items to assign arguments using an asterisk
my_list = [3, 4]
print(add_numbers(*my_list))
Creating a List
We create a list by listing items inside square brackets. We can include elements of any type.
# Create an empty list
my_list = []
# Create a list with any number of items
my_list = [item1, item2, item3]
# Example:
number_list = [1, 2, 4]
# A list can have any type
my_list = [integer, string, boolean]
# Example:
a_list = ["hello", 4, True]
Altering a List
Due to the mutable nature of lists, we can alter individual elements in the list.
# Access an element in a list
a_list = [“hello”, 4, True]
first_element = a_list[0] # Returns "hello"
# Set an element in a list
a_list = [“hello”, 4, True]
a_list[0] = 9 # Changes a_list to be [9, 4, True]
# Looping over a list
# Prints each item on a separate line (9, then 4, then True)
a_list = [9, 4, True]
for item in a_list:
print(item)
# Length of a list
a_list = [9, 4, True]
a_list_length = len(a_list) # Returns 3
# Creates a list based on first operation
# This will create a list with numbers 0 to 4
a_list = [x for x in range(5)]
# This will create a list with multiples of 2 from 0 to 8
list_of_multiples = [2*x for x in range(5)]
Series
A Series is a one-dimensional array. It is formatted similar to one column in a table. Series includes indices that start at 0 and number the rows.
# Creates a Series using a list
scores = pd.Series([96, 88, 89, 90])
# Creates a Series using a list AND specifying the indices
ingredients = pd.Series(["6 ounces", "1 cup",
"2 large", "1 cup"], index=["Coffee", "Milk",
"Eggs", "Sugar"])
# Creates a series using a Python dictonary.
# The key becomes the index.
s = {"Los Angeles Dodgers": 2020, "New York Yankees": 2009,
"Boston Red Sox": 2018, "Chicago Cubs": 2016,
"San Francisco Giants": 2014, "Colorado Rockies": None}
world_series = pd.Series(s)
Searches
Searches for an item in the Series
2002 in name_of_series # Returns True or False
"mouse" in name_of_series # Returns True or False
Statistics
The follow functions return summary statistics using data in a Series or DataFrame.
# Returns all statistics at one time
df.describe()
# Or return each measure separately
df.mean()
df.median()
df.mode()
df.min()
df.max()
df.count()
The follow functions return measures of spread for the dataset.
# Returns the variance and the standard deviation
df.var()
df.std()
# Find the range using the max and min values
max = people_named_anna.max()
min = people_named_anna.min()
range = max - min
# Find the interquartile range using the first and third
# quartile values
Q1 = people_named_anna.quantile(0.25)
Q3 = people_named_anna.quantile(0.75)
IQR = Q3 - Q1
Dictionaries
Dictionaries have a collections of key-value pairs.
a_dictionary = {key1:value1, key2:value2}
# Example:
# This dictionary keeps a farm's animal count
my_farm = {pigs:2, cows:4}
# Creates an empty dictionary
a_dictionary = {}
# Inserts a key-value pair
a_dictionary[key] = value
my_farm["horses"] = 1 # The farm now has one horse
# Gets a value for a key
my_dict[key] # Will return the key
my_farm["pigs"] # Will return 2, the value of "pigs"
# Using the 'in' keyword
my_dict = {"a": 1, "b": 2}
print("a" in my_dict) # Returns True
print("z" in my_dict) # Returns False
print(2 in my_dict) # Returns False, 2 is not a key
# Iterating through a dictionary
for key in my_dict:
print("key: " + str(key))
print("value: " + str(my_dict[key]))
DataFrames
A data frame is a two-dimensional data structure. The data is aligned in a tabular fashion in rows and columns. DataFrames include indices that start at 0 and number the rows.
# Creates a DataFrame using a Python dictonary.
data = {"mammal": ["African Elephant", "Bottlenose Dolphin",
"Cheetah", "Domestic Cat"],
"life_span": [70, 25, 14, 16]
}
mammals = pd.DataFrame(data)
DataFrame Functions
# Returns the data type of each column
df.dtypes
# Returns the number of rows and columns as (rows, columns)
df.shape
# Returns summary statistics about each column
df.describe()
# Returns summary statistics, rounding to one decimal
round((df.describe()), 1)
Filtering
iloc
Index-based selection (iloc) selects rows and columns by their index location or address in the table.
# Returns rows from index location 0 to 1
# and columns from index location 3 to 6
df.iloc[0:2, 3:7]
loc
Label-based selection (loc) selects rows and columns by their label or name in the table.
# Returns rows with the index 8 through 12
# and columns named "country" and "score"
df.loc[8:12, ["country","score"]]
Conditional Formatting
Conditions can be used together with loc to filter for specific values, etc.
# Returns only rows with a score higher than 7
# and only the score column
df.loc[df.score > 7, ["score"]])
Boxplots
Be sure to import the Matplotlib library for visualizations
Be sure to import the Matplotlib library for visualizations
import matplotlib.pyplot as plt
Simple Pie Chart
# Groups by a specific column and sums up the total
df1 = df.groupby("column1").sum()
# Plots using the sums and another column
df1.plot.pie(y="column2", labels=df1.index)
plt.show()
Advanced Pie Chart
# Specify the colors used
colors = ["lightcoral", "lightskyblue", "gold"]
# Set the middle section to "explode"
explode = [0, 0.1, 0]
# Plot the pie chart using the data frame
# Organize it by a specific column
# Set a start angle for the text
# Display percentages
df.plot.pie(y="column", colors=colors, explode=explode,
startangle=45, autopct="%1.1f%%")
# Move the legend to the best location
plt.legend(loc="upper right")
plt.show()
Scatterplots
Be sure to import the Matplotlib library for visualizations
# Set x1 and y1
x = df.age.loc[df.sex == "f"]
y = df.height.loc[df.sex == "f"]
# Set x2 and y2
x2 = df.age.loc[df.sex == "m"]
y2 = df.height.loc[df.sex == "m"]
# Plot and customize each line
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()
More Options
# Add labels
plt.xlabel("Age")
plt.ylabel("Height")
plt.title("Height of School Children")
# Add a legend
plt.legend(["Females", "Males"])
Bar Charts
Be sure to import the Matplotlib library for visualizations
import matplotlib.pyplot as plt
Plot bar chart using two columns of data (x and y)
# Set color, width, and edgecolor of bars
plt.bar(x=df.column1, height=df.column2, width=1,
edgecolor="black", color="#EA638C")
plt.show()
More Options
# Add labels and a title
plt.xlabel("Month")
plt.ylabel("Temperature (°F)")
plt.title("Average GA Temps", fontsize=22)
# Adjust grid and rotation of x ticks
plt.grid(False)
plt.xticks(rotation=45)
Plot bar chart using three columns of data (x1, x2 and y)
# Set the width of the bar
bar_width = 0.4
# Plot first dataset
plt.bar(x=df.column1, height=df.column2,
width=bar_width, color="#EA638C")
# Plot second data set.
# Add the bar width to the x value so that the bars
# do not overlap
plt.bar(x=df.column3 + bar_width, height=df.column2,
width=bar_width, color="#190E4F")
# Add a legend
plt.legend(["First Column", "Second Column "])
plt.show()
Normal Distribution
Be sure to import the Matplotlib library and the SciPy library
import matplotlib.pyplot as plt
from scipy.stats import norm
You will also need to include scipy in the requirements.txt file
Plot the Data
# Set data to be the values in a specific column
data = df.column
# Plot the histogram (w/density)
plt.hist(data, bins=10, density=True)
plt.show()
Plot the Normal Distribution Curve
# Determine the mean, median and std
mean = data.mean()
median = data.median()
std = data.std()
# Set up min and max of the x-axis using the mean and standard deviation
xmin = mean - 3 * std
xmax = mean + 3 * std
# Define the x-axis values
x = range(int(xmin), int(xmax))
# "Norm" the y-axis values based on the x-axis values, the mean and the std
y = norm.pdf(x, mean, std)
# Plot the graph using the x and the y values
plt.plot(x, y, color="orange", linewidth=2)
plt.show()
Determine the Likelihood
The pdf finds the likelihood of an exact event. The value is used to graph the normal distibution, but is not typically used in determining likelihood since it is usually a very low number.
pdf = norm.pdf(x_value, mean, std)
print(pdf)
The cdf finds the cumulative likelihood.
What is the likelihood that a value is less than the x_value?
cdf = norm.cdf(x_value, mean, std)
print(cdf)
What is the likelihood that a value is more than the x_value?
Be sure to import the Matplotlib library and the NumPy library
import numpy as np
import matplotlib.pyplot as plt
Determining Correlation
Linear Regression is only valid for values that have a correlation.
# Set the x and y values
x = column1
y = column2
# Determine and display the correlation
correlation = y.corr(x)
print(correlation)
Determining the Line of Best Fit
The model will print as a list that includes the slopr and the y-intercept: [ m, b ]
# Determine the model equation
model = np.polyfit(x, y, 1)
print(model)
Predicting Using a Model
# Create the predict function
predict = np.poly1d(model)
# Use the predict function
value = 60
prediction = predict(value)
print(prediction)
Plotting the Line of Best Fit
# Determine the min and max values of the x-axis
print(df.wait_time.min())
print(df.wait_time.max())
# Create the line of best fit
# range is based on the min and max values determined above
x_lin_reg = range(min, max)
y_lin_reg = predict(x_lin_reg)
plt.plot(x_lin_reg, y_lin_reg)
plt.show()
User Input
We can use input from the user to control our code. The input is saved as a string by default.
# If the input is a string.
name = input("What is your name? ")
# If the input needs to be used as a number include
# the term 'int' or 'float'
num_one = int(input("Enter a number: "))
num_two = int(input("Enter a second number: "))
num_three = float(input("Enter a third number: "))
If/Else Statements
We can tell the computer how to make decisions using if/else statements. Make sure that all the code inside your if/else statement is indented one level!
If Statements
Use an if statement to instruct the computer to do something only when a condition is true. If the condition is false, the command indented underneath will be skipped.
if BOOLEAN_EXPRESSION:
print("This executes if BOOLEAN_EXPRESSION is True")
# Example:
# This will only print if the user enters a negative number
number = int(input("Enter a number: "))
if number < 0:
print(str(number) + " is negative!")
If/Else Statements
Use an if/else statement to force the computer to make a decision between multiple conditions. If the first condition is false, the computer will skip to the next condition until it finds one that is true. If no conditions are true, the commands inside the else block will be performed.
if condition_1:
print("This executes if condition_1 evaluates to True")
elif condition_2:
print("This executes if condition_2 evaluates to True")
else:
print("This executes if no prior conditions are True")
# Example:
# This program will print that the color is secondary
color == "purple"
if color == "red" or color == "blue" or color == "yellow":
print("Primary color.")
elif color == "green" or color == "orange" or color == "purple":
print("Secondary color.")
else:
print("Not a primary or secondary color.")
Loops
Loops help us repeat commands which makes our code much shorter. Make sure everything inside the loop is indented one level!
For Loops
Use for loops when you want to repeat something a fixed number of times.
# This for loop will print "hello" 5 times
for i in range(5):
print("hello")
# This for loop will print out even numbers 1 through 10
for number in range(2, 11, 2):
print(i)
# This code executes on each item in my_list
# This loop will print 1, then 5, then 10, then 15
my_list = [1, 5, 10, 15]
for item in my_list:
print(item)
While Loops
Use while loops when you want to repeat something an unknown number of times or until a condition becomes false. If there is no point where the condition becomes false, you will create an infinite loop which should always be avoided!
# This program will run as long as the variable 'number' is greater than 0
# Countdown from from 10 to 0
number = 10
while number >= 0:
print(number)
number -= 1
# You can also use user input to control a while loop
# This code will continue running while the user answers ‘Yes’
continue = input("Continue code?: ")
while continue == "Yes":
continue = input("Continue code?: ")
Strings
Strings are pieces of text. We can gain much information about strings and alter them in many ways using various methods.
Indexing a String
We use indexing to find or take certain portions of a string. Index values always start at 0 for the first character and increase by 1 as we move to the right. From the end of the string, the final value also has an index of -1 with the values decreasing by 1 as we move to the left.
# Prints a character at a specific index
my_string = "hello!"
print(my_string[0]) # print("h")
print(my_string[5]) # print("!")
# Prints all the characters after the specific index
my_string = "hello world!"
print(my_string[1:]) # print("ello world!")
print(my_string[6:]) # prints("world!")
# Prints all the characters before the specific index
my_string = "hello world!"
print(my_string[:6]) # print("hello")
print(my_string[:1]) # print("h")
# Prints all the characters between the specific indices
my_string = "hello world!"
print(my_string[1:6]) # print("ello")
print(my_string[4:7]) # print("o w")
# Iterates through every character in the string
# Will print one letter of the string on each line in order
my_string = "Turtle"
for c in my_string:
print(c)
# Completes commands if the string is found inside the given string
my_string = "hello world!"
if "world" in my_string:
print("world")
# Concatenation
my_string = "Tracy the"
print(my_string + " turtle") # print(“Tracy the turtle”)
# Splits the string into a list of letters
my_string = "Tracy"
my_list = list(my_string) # my_list = ['T’, ‘r’, ‘a’, ‘c’, ‘y’]
# Using enumerate will print the index number followed by a colon and the
# word at that index for each word in the list
my_string = "Tracy is a turtle"
for index, word in enumerate(my_string.split()):
print(str(index) + ": " + word)
String Methods
There are many methods that can be used to alter strings.
# upper: To make a string all uppercase
my_string = "Hello"
my_string = my_string.upper() # returns "HELLO"
# lower: To make a string all lowercase
my_string = "Hello"
my_string = my_string.lower() # returns "hello"
# isupper: Returns True if a string is all uppercase letters and False otherwise
my_string = "HELLO"
print(my_string.isupper()) # returns True
# islower: Returns True if a string is all lowercase letters and False otherwise
my_string = "Hello"
print(my_string.islower()) # returns False
# swapcase: Returns a string where each letter is the opposite case from original
my_string = "PyThOn"
my_string = my_string.swapcase() # returns "pYtHoN"
# strip: Returns a copy of the string without any whitespace at beginning or end
my_string = " hi there "
my_string = my_string.strip() # returns "hi there"
# find: Returns the lowest index in the string where substring is found
# Returns -1 if substring is not found
my_string = "eggplant"
index = my_string.find("plant") # returns 3
index = my_string.find("Tracy") # returns -1
# split: Splits the string into a list of words at whitespace
my_string = "Tracy is a turtle"
my_list = my_string.split() # Returns ['Tracy', 'is', 'a', 'turtle']
Set Index
View a dataframe using a different index. This will not change the data frame.
df.set_index("column")
Modify and change the data frame to use a new column as the index.
# Import the data
df = pd.read_csv (r"data.csv")
# Remove max columns limitation and show all columns.
pd.set_option("display.max_columns", None)
Data Cleaning
Dropping Data
# Drop unnecessary columns
df = df.drop(["column1", "column2"], axis=1)
Determine missing values in each column.
df.isnull().sum()
Drop missing values.
# Drop rows that contain missing values
df.dropna()
# Drop columns that contain missing values
df.dropna(axis=1)
Fill in missing values.
# Fill in with a specific value
df.fillna(0, inplace=True)
# Fill in with the number in the row behind it.
df.fillna(method='bfill')
# Fill in with the number in the column before it.
df.fillna(method='ffill', axis=1)
Determine the number of duplicate rows.
df.duplicated().sum()
Find the duplicated row(s).
df.loc[df.duplicated()]
Drop duplicate rows.
df.drop_duplicates(inplace=True)
Change the data type of a column.
# Change the data type to a specific data type
df.column.astype(data_type)
# Change the data type to a float
pd.to_numeric(df.column)
Grouping/Sorting
Grouping
# Groups and returns the count
df.groupby("value_to_group_by").column.count()
# Groups and returns the maximum value in two columns
df.groupby("value_to_group_by")[["column1", "column2"]].max()
# Groups and returns the min, max and sum of the
# values in a column
df.groupby("value_to_group_by").column.agg([min, max, sum])
# Groups and returns the sorted list of values in a column
df.groupby("value_to_group_by").column.agg([sorted]
Sorting
# Sort values (increasing/ascending)
df.sort_values(by="sorting_value")
# Sorts one column of values (decreasing/descending) and
# then by another (increasing/ascending)
df.sort_values(by=["sort1", "sort2"], ascending=[False, True])
Combining Datasets
To concatenating or merge a dataset, make sure that column names match between the different datasets.
# Concatenating two datasets:
# add second data set on as new rows
# use the reset_index function to renumber the rows
combined_df = pd.concat([df1, df2]).reset_index()
# Merging/Joining two datasets:
# Merge everything from both data sets
pd.merge(df1, df2, on="name", how="outer")
# Merge only values that exist in BOTH data sets
pd.merge(df1, df2, on="name", how="inner")
# Keep everything in the first data set and
# merge in matching values from the second
pd.merge(df1, df2, on="name", how="left")
# Keep everything in the second data set and
# merge in matching values from the first
pd.merge(df1, df2, on="name", how="right")