Created by Nathan Kelber for JSTOR Labs under Creative Commons CC BY License
For questions/comments/improvements, email nathan.kelber@ithaka.org.
Python Basics 4#
Description: This lesson describes the basics of writing your own functions including:
def
statements
This lesson concludes with a description of popular Python packages and directions for installing them in Constellate.
This is part 4 of 5 in the series Python Basics that will prepare you to do text analysis using the Python programming language.
Use Case: For Learners (Detailed explanation, not ideal for researchers)
Difficulty: Beginner
Completion Time: 90 minutes
Knowledge Required:
Knowledge Recommended: None
Data Format: None
Libraries Used:
time
to put make the computer wait a few seconds
random
to generate random numbers
Research Pipeline: None
Functions#
We have used several Python functions already, including print()
, input()
, and range()
. You can identify a function by the fact that it ends with a set of parentheses ()
where arguments can be passed into the function. Depending on the function (and your goals for using it), a function may accept no arguments, a single argument, or many arguments. For example, when we use the print()
function, a string (or a variable containing a string) is passed as an argument.
Functions are a convenient shorthand, like a mini-program, that makes our code more modular. We don’t need to know all the details of how the print()
function works in order to use it. Functions are sometimes called “black boxes”, in that we can put an argument into the box and a return value comes out. We don’t need to know the inner details of the “black box” to use it. (Of course, as you advance your programming skills, you may become curious about how certain functions work. And if you work with sensitive data, you may need to peer in the black box to ensure the security and accuracy of the output.)
Libraries and Modules#
While Python comes with many functions, there are thousands more that others have written. Adding them all to Python would create mass confusion, since many people could use the same name for functions that do different things. The solution then is that functions are stored in modules that can be imported for use. A module is a Python file (extension “.py”) that contains the definitions for the functions written in Python. These modules (individual Python files) can then be collected into even larger groups called packages and libraries. Depending on how many functions you need for the program you are writing, you may import a single module, a package of modules, or a whole library.
The general form of importing a module is:
import module_name
You may recall from the “Getting Started with Jupyter Notebooks” lesson, we imported the time
module and used the sleep()
function to wait 5 seconds.
# A program that waits five seconds then prints "Done"
import time # We import all the functions in the `time` module
print('Waiting 5 seconds...')
time.sleep(5) # We run the sleep() function from the time module using `time.sleep()`
print('Done')
Waiting 5 seconds...
Done
We can also just import the sleep()
function without importing the whole time
module. The syntax is:
from module import function
# A program that waits five seconds then prints "Done"
from time import sleep # We import just the sleep() function from the time module
print('Waiting 5 seconds...')
sleep(5) # Notice that we just call the sleep() function, not time.sleep()
print('Done')
Waiting 5 seconds...
Done
Writing a Function#
In the above examples, we called a function that was already written. However, we can also create our own functions!
The first step is to define the function before we call it. We use a function definition statement followed by a function description and a code block containing the function’s actions:
def my_function():
"""Description of what the functions does"""
python code to be executed
After the function is defined, we can call on it to do us a favor whenever we need by simply executing the function like so:
my_function()
After the function is defined, we can call it as many times as we want without having to rewrite its code. In the example below, we create a function called complimenter_function
then call it twice.
# Create a complimenter function
def complimenter_function():
"""prints a compliment""" # Function definition statement
print('You are looking great today!')
After you define a function, don’t forget to call it to make it do the work!
# Give a compliment by calling the function
complimenter_function()
You are looking great today!
Ideally, a function definition statement should specify the data that the function takes and whether it returns any data. The triple quote notation can use single or double quotes, and it allows the string for the definition statement to expand over multiple lines in Python. If you would like to see a function’s definition statement, you can use the help()
function to check it out.
# Examining the function definition statement for our function
# Note that the parentheses are not included with complimenter_function
help(complimenter_function)
Help on function complimenter_function in module __main__:
complimenter_function()
prints a compliment
# Try using help() to read the definition for the sleep function
Parameters vs. Arguments#
When we write a function definition, we can define a parameter to work with the function. We use the word parameter to describe the variable in parentheses within a function definition:
def my_function(input_variable):
"""Takes in X and returns Y"""
do this task
In the pseudo-code above, input_variable
is a parameter because it is being used within the context of a function definition. When we actually call and run our function, the actual variable or value we pass to the function is called an argument.
# Change the complimenter function to give user-dependent compliment
def complimenter_function(user_name):
"""Takes in a name string, prints a compliment with the name"""
print(f'You are looking great today, {user_name}!')
# Pass an argument to a function
complimenter_function('Sam')
You are looking great today, Sam!
Arguments can be passed in based on parameter order (positional) or they can be explicitly passed using an =
. (This could be useful if we wanted to pass an argument for the 10th parameter, but we did not want to pass arguments for the nine other parameters defined before it.)
# Pass an argument with =
# user_name is the parameter, 'Sam' is the argument
complimenter_function(user_name='Sam')
You are looking great today, Sam!
In the above example, we passed a string into our function, but we could also pass a variable. Try this next. Since the complimenter_function
has already been defined, you can call it in the next cell without defining it again.
# Ask the user for their name and store it in a variable called name
# Then call the complimenter_function and pass in the name variable
A variable passed into a function could contain a list or dictionary.
# A list of names
list_of_names = ['Jenny', 'Pierre', 'Hamed']
def greet_a_list(names):
"""takes a list of names and prints out a greeting for each name"""
for name in names:
print(f'Hi {name}!')
greet_a_list(list_of_names)
Hi Jenny!
Hi Pierre!
Hi Hamed!
The Importance of Avoiding Duplication#
Using functions makes it easier for us to update our code. Let’s say we wanted to change our compliment. We can simply change the function definition one time to make the change everywhere. See if you can change the compliment given by our complimenter function.
# Create a complimenter function that gives compliment
def complimenter_function(user_name):
"""Takes in a name string, prints a compliment with the name"""
print(f'You are looking great today, {user_name}!')
# Give a new compliment by calling the function
name = input('What is your name? ')
complimenter_function(name)
friend = input('Who is your friend? ')
complimenter_function(friend)
You are looking great today, John!
You are looking great today, Jane!
By changing our function definition just one time, we were able to make our program behave differently every time it was called. If our program was large, it might call our custom function hundreds of times. If our code repeated like that, we would need to change it in every place!
Generally, it is good practice to avoid duplicating program code to avoid having to change it in multiple places. When programmers edit their code, they may spend time deduplicating (getting rid of code that repeats). This makes the code easier to read and maintain.
Coding Challenge! < / >
In the next cell, try writing a function that accepts a dictionary as an argument. Use a flow control statement to print out all the names and occupations for the contacts.
# A dictionary of names and occupations
contacts = {
'Amanda Bennett': 'Engineer, electrical',
'Bryan Miller': 'Radiation protection practitioner',
'Chris Garrison': 'Planning and development surveyor',
'Debra Allen': 'Intelligence analyst'}
# Define and then call your function here
Function Return Values#
Whether or not a function takes an argument, it will always return a value. If we do not specify that return value in our function definition, it is automatically set to None
, a special value like the Boolean True
and False
that simply means null or nothing. (None
is not the same thing as, say, the integer 0
.) We can also specify return values for our function using a flow control statement followed by return
in a code block.
If you don’t write a Return
statement in your function, a None
value will be returned. If you don’t write a Return
statement in your function, a None
value will be returned.
# Find out the returned value for the following function
def complimenter_function(user_name):
"""Takes in a name string, prints a compliment with the name"""
print(f'You are looking great today, {user_name}!')
print(complimenter_function('Sam'))
You are looking great today, Sam!
None
Instead of automatically printing inside the function, the better approach is to return a string value and let the user decide whether to print it or do something else with it. Ideally, our function definition statement should indicate what goes into the function and what is returned by the function.
# Adding a return statement
def complimenter_function(user_name):
"""Takes in a name string, returns a compliment with the name""" # We are returning now
return f'You are looking great today, {user_name}!'
compliment = complimenter_function('Sam')
print(compliment)
You are looking great today, Sam!
Returning the string allows the programmer to use the output instead of just printing it automatically. This is usually the better practice.
We can also offer multiple return statements with flow control. Let’s write a function for telling fortunes. We can call it fortune_picker
and it will accept a number (1-6) then return a string for the fortune.
# A fortune-teller program that contains a function `fortune_picker`
# `fortune_picker` accepts an integer (1-6) and returns a fortune string
def fortune_picker(fortune_number): # A function definition statement that has a parameter `fortune_number`
"""takes an integer (1-6) and returns a fortune string"""
if fortune_number == 1:
return 'You will have six children.'
elif fortune_number == 2:
return 'You will become very wise.'
elif fortune_number == 3:
return 'A new friend will help you find yourself.'
elif fortune_number == 4:
return 'A great fortune is coming to you.'
elif fortune_number == 5:
return 'That promising venture... it is a trap.'
elif fortune_number == 6:
return 'Sort yourself out then find love.'
fortune = fortune_picker(3) # return a fortune string and store it in fortune
print(fortune)
A new friend will help you find yourself.
In our example, we passed the argument 3
that returned the string 'A new friend will help you find yourself'
. To change the fortune, we would have to pass a different integer into the function. To make our fortune-teller random, we could import the function randint()
that chooses a random number between two integers. We pass the two integers as arguments separated by a comma.
# A fortune-teller program that uses a random integer
from random import randint # import the randint() function from the random module
def fortune_picker(fortune_number): # A function definition statement that has a parameter `fortune_number`
if fortune_number == 1:
return 'You will have six children.'
elif fortune_number == 2:
return 'You will become very wise.'
elif fortune_number == 3:
return 'A new friend will help you find yourself.'
elif fortune_number == 4:
return 'A great fortune is coming to you.'
elif fortune_number == 5:
return 'That promising venture... it is a trap.'
elif fortune_number == 6:
return 'Sort yourself out then find love.'
random_number = randint(1, 6) # Choose a random number between 1 and 6 and assign it to a new variable `random_number`
fortune = fortune_picker(random_number) # Return a fortune string
print('Your fortune is: ')
print(fortune)
while True:
print('Would you like another fortune?')
repeat_fortune = input()
if repeat_fortune == 'yes' or repeat_fortune == 'Yes':
random_number = randint(1, 6)
print(fortune_picker(random_number))
continue
else:
print('I have no more fortunes to share.')
break
Your fortune is:
Sort yourself out then find love.
Would you like another fortune?
I have no more fortunes to share.
Coding Challenge! < / >
Try writing a function that accepts user inputting a name and returns the person’s occupation. You can use the .get()
method to retrieve the relevant occupation, such as:
contacts.get('Amanda', 'No contact with that name')
Remember, the second string will be returned if the name ‘Amanda’ is not in our dictionary.
# A program that returns the occupation when users supply a given name
# A dictionary of names and occupations
contacts = {
'Amanda': 'Engineer, electrical',
'Bryan': 'Radiation protection practitioner',
'Christopher': 'Planning and development surveyor',
'Debra': 'Intelligence analyst'}
# The function definition and program
Local and Global Scope#
We have seen that functions make maintaining code easier by avoiding duplication. One of the most dangerous areas for duplication is variable names. As programming projects become larger, the possibility that a variable will be re-used goes up. This can cause weird errors in our programs that are hard to track down. We can alleviate the problem of duplicate variable names through the concepts of local scope and global scope.
We use the phrase local scope to describe what happens within a function. The local scope of a function may contain a local variables, but once that function has completed the local variables and their contents are erased.
On the other hand, we can also create global variables that persist at the top-level of the program and also within the local scope of a function.
In the global scope, Python does not recognize any local variable from within the program’s functions
In the local scope of a function, Python can recognize any global variables
It is possible for there to be a global variable and a local variable with the same name
Ideally, Python programs should limit the number of global variables and create most variables in a local scope. This keeps confounding variables localized in functions where they are used and then discarded.
# Demonstration of global variable being used in a local scope
# The program crashes when a local variable is used in a global scope
global_string = 'global'
def print_strings():
print('We are in the local context:')
local_string = 'local'
print(global_string)
print(local_string)
print_strings()
We are in the local context:
global
local
The code above defines a global variable global_string
with the value of ‘global’. A function, called print_strings
, then defines a local variable local_string
with a value of ‘local’. When we call the print_strings()
function, it prints the local variable and the global variable.
# The function has closed, now the local string has been discarded
print('We are now in the global context: ')
print(global_string)
print(local_string)
We are now in the global context:
global
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 4
2 print('We are now in the global context: ')
3 print(global_string)
----> 4 print(local_string)
NameError: name 'local_string' is not defined
After the print_strings()
function completes, we try to print both variables in a global scope. The program prints global_string
but crashes when trying to print local_string
in a global scope.
It’s a good practice not to name a local variable the same thing as a global variable. If we define a variable with the same name in a local scope, it becomes a local variable within that scope. Once the function is closed, the global variable retains its original value.
# A demonstration of global and local scope using the same variable name
# print(string) returns two different results
string = 'global'
def share_strings():
string = 'local'
print(string)
share_strings()
local
# print the string variable in the global context
print(string)
global
Popular Python Packages#
Modules containing functions for a similar type of task are often grouped together into a package. Here are some of the most popular packages used in Python:
Processing and cleaning data#
Visualizing data#
matplotlib- Creates static, animated, and interactive visualizations.
Seaborn- An expansion of matplotlib that provides a “high-level interface for drawing attractive and informative statistical graphics
Plotly- Create graphs, analytics, and statistics visualizations
Dash- Create interactive web applications and dashboards
Text Analysis#
Artificial Intelligence and Machine Learning#
sci-kit-learn- Implement machine learning in areas such as classification, predictive analytics, regression, and clustering
Keras- Implement deep learning using neural networks
TensorFlow- Implement machine learning with a particular focus on training and deep neural networks
🤗 Transformers- Easily work a variety of models based on Hugging Face 🤗
Data Gathering#
Requests- An HTTP client that helps connect to websites and download files
urllib3- Another HTTP client that helps connect to websites and download files
Beautiful Soup- Pull data out of HTML or XML files, helpful for scraping information from websites
Scrapy- Helps extract data from websites
Textual Digitization#
Tesseract- Use optical character recognition to convert images into plaintext
Pillow- Read and manipulate images with Python
Packages are generally installed by using PyPI, the official Python package index. As of April 2022, there are over 350,000 packages available.
Installing a Python Package in Constellate#
If you would like to install a package that is not in Constellate, we recommend using the pip installer with packages from the Python Package Index. In a code cell insert the following code:
!pip install package_name
for the relevant package you would like to install. The exclamation point indicates the line should be run as a terminal command.
Refer to the package’s documentation for guidance.
# Install Scrapy
!pip install scrapy
Lesson Complete#
Congratulations! You have completed Python Basics 4. There is one more lesson in Python Basics:
Python Basics 5
Start Next Lesson: Python Basics 5#
Exercise Solutions#
Here are a few solutions for exercises in this lesson.
# A dictionary of names and occupations
contacts = {
'Amanda Bennett': 'Engineer, electrical',
'Bryan Miller': 'Radiation protection practitioner',
'Chris Garrison': 'Planning and development surveyor',
'Debra Allen': 'Intelligence analyst'}
# Define and then call your function here
def print_contacts(contacts_names):
"""Prints out all the contacts in a contacts dictionary"""
for name, occupation in contacts.items():
print(name.ljust(15), '|', occupation)
print_contacts(contacts)
Amanda Bennett | Engineer, electrical
Bryan Miller | Radiation protection practitioner
Chris Garrison | Planning and development surveyor
Debra Allen | Intelligence analyst
# A dictionary of names and occupations
contacts = {
'Amanda': 'Engineer, electrical',
'Bryan': 'Radiation protection practitioner',
'Christopher': 'Planning and development surveyor',
'Debra': 'Intelligence analyst'}
def occupation_finder(name):
"""Allows a user to find the occupation of a particular contact"""
return contacts.get(name, 'No contact with that name')
while True:
print('Enter a name to look up an occupation (or enter quit):')
name = input()
if name == 'quit':
print('Shutting down..')
break
else:
print(occupation_finder(name))
continue
Enter a name to look up an occupation (or enter quit):
Engineer, electrical
Enter a name to look up an occupation (or enter quit):
Shutting down..