Working with file paths in Python
What this guide covers
This guide covers an overview of some issues you may encounter in working with file paths in this IEP minor.
It covers the concepts you need to consider and points to ways you can write your code to avoid issues.
In summary:
- prefer the Pathlib library over os.path
- prefer file paths relative to the project to those specific to your computer (e.g.
data/file.csv
notUsers/Jack/myproject/data/file.csv
) - take care with file paths relative to another file (e.g.
../data/datafile.csv
), they may not behave as you expect when you have code in packages that are included elsewhere - use Pathlib
joinpath
rather than " \ " or " / "
The issues
The main points to consider are:
-
Windows and Unix/Mac file paths are different.
Consider a Mac/Unix style file path
/Users/jojo/py_project/test.py
and a Windows file pathC:\\Documents\py_project
test.py`. The syntax and structure are different. Further where you have the files for a project on your computer is likely different to where someone else does, so using shared code where file paths are written using a particular operating system format using a given person's directory structure quickly becomes problematic. -
The root folder can differ depending on your code editor.
If you are writing code in VSCode and PyCharm then filepath roots are typically the project root, that is you can add paths that are relative to your project root and ignore the preceding local file system directory structure. As an example a file in the data folder would be
data/file.csv
and notC:\\Documents\py_project\data\file.csv
Different environments and editors may set the project root differently. -
The relative path can differ depending on where the code is called from.
If you use a file path in a code file so that it is relative to the current file, you are likely to get issues if you then import that file and execute it from another. For example: You have the following directories and files:
/data/datafile.csv
,/module_a/code_a.py
and/module_a/module_b/code_b.py
. Incode_a.py
if you reference the datafile using../data/datafile.csv
in a function that you then import and use insidecode_b.py
you might get an issue as relative tocode_b.py
the data file is not in../data/datafile.csv
but in../../data/datafile.csv
. It's a little more complex than this, however, using relative file paths can lead to problems.When you are working with packages in Python then relative paths are relative to the current working directory rather than the code file it is written in.
-
Referencing files in web apps introduces further complexity.
In web apps you will also need to consider that where files are located on a development platform for example is likely different to that of the deployed version. This isn't covered here but in COMP0034 we will consider using configuration parameters that tell the web app where to look for certain files. For example, you can configure the project root, the folder where static files are, etc.
Solutions using the Python pathlib library
Some solutions suggest the use of os.path
however from Python 3.4 the pathlib
module was introduced, so you should
try to use this as it addresses some of the above-mentioned issues.
- Avoids the
\\
versus/
issue by using the Pathlibjoinpath
method. - Has methods that let you determine the current working directory e.g.
pathlib.Path.cwd()
. For example:my_file_path = pathlib.Path.cwd().joinpath('data','datafile.csv')
instead ofdata/datafile.csv
ordata\datafile.csv
- Allows you to code relative to the current code file, whatever that file is
e.g.
pathlib.Path(__file__).resolve().parent
would go to the directory that is the parent of the current file.