Dealing with files
This chapter will discuss the open()
built-in function and some of the built-in modules for file processing.
open and close
The open() built-in function is one of the ways to read and write files. The first argument to this function is the filename to be processed. The filename is a relative/absolute path to the location of the file. Rest are keyword arguments that you can configure. The output is a TextIOWrapper
object (i.e. a filehandle), which you can use as an iterator. Here's an example:
# default mode is rt i.e. read text
>>> fh = open('ip.txt')
>>> fh
<_io.TextIOWrapper name='ip.txt' mode='r' encoding='UTF-8'>
>>> next(fh)
'hi there\n'
>>> next(fh)
'today is sunny\n'
>>> next(fh)
'have a nice day\n'
>>> next(fh)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
# check if the filehandle is active or closed
>>> fh.closed
False
# close the filehandle
>>> fh.close()
>>> fh.closed
True
The mode
argument specifies what kind of processing you want. Only text
mode will be covered in this chapter, which is the default. You can combine options, for example, rb
means read
in binary
mode. Here's the relevant details from the documentation:
'r'
open for reading (default)'w'
open for writing, truncating the file first'x'
open for exclusive creation, failing if the file already exists'a'
open for writing, appending to the end of the file if it exists'b'
binary mode't'
text mode (default)'+'
open for updating (reading and writing)
The encoding
argument is meaningful only in the text
mode. You can check the default encoding for your environment using the locale
module as shown below. See docs.python: standard encodings and docs.python HOWTOs: Unicode for more details.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
Here's how Python handles line separation by default, see documentation for more details.
On input, if
newline
isNone
, universal newlines mode is enabled. Lines in the input can end in'\n'
,'\r'
, or'\r\n'
, and these are translated into'\n'
before being returned to the caller.
On output, if
newline
isNone
, any'\n'
characters written are translated to the system default line separator,os.linesep
.
If the given filename doesn't exist, you'll get a FileNotFoundError
exception.
>>> open('xyz.txt', 'r', encoding='ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'xyz.txt'
Context manager
Quoting from docs.python: Reading and Writing Files:
It is good practice to use the
with
keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Usingwith
is also much shorter than writing equivalenttry-finally
blocks.
# read_file.py
with open('ip.txt', 'r', encoding='ascii') as f:
for ip_line in f:
op_line = ip_line.rstrip('\n').capitalize() + '.'
print(op_line)
Recall that as
keyword was seen before in Different ways of importing and try-except sections. Here's the output of the above program:
$ python3.9 read_file.py
Hi there.
Today is sunny.
Have a nice day.
See The Magic of Python Context Managers for more examples and details.
read, readline and readlines
The read()
method gives you entire remaining contents of the file as a single string. The readline()
method gives next line of text and readlines()
gives all the remaining lines as a list
of strings.
>>> open('ip.txt').read()
'hi there\ntoday is sunny\nhave a nice day\n'
>>> fh = open('ip.txt')
# readline() is similar to next()
# but returns empty string instead of StopIteration exception
>>> fh.readline()
'hi there\n'
>>> fh.readlines()
['today is sunny\n', 'have a nice day\n']
>>> fh.readline()
''
write
# write_file.py
with open('op.txt', 'w', encoding='ascii') as f:
f.write('this is a sample line of text\n')
f.write('yet another line\n')
You can call the write()
method on a filehandle to add contents to that file (provided the mode
you have set supports writing). Unlike print()
, the write()
method doesn't automatically add newline characters.
$ python3.9 write_file.py
$ cat op.txt
this is a sample line of text
yet another line
$ file op.txt
op.txt: ASCII text
If the file already exists, the
w
mode will overwrite the contents (i.e. existing content will be lost).
You can also use the
print()
function for writing by passing the filehandle to thefile
argument. The fileinput module supports in-place editing and other features (see In-place editing with fileinput section for examples).
File processing modules
This section gives introductory examples for some of the built-in modules that are handy for file processing. Quoting from docs.python: os:
This module provides a portable way of using operating system dependent functionality.
>>> import os
# current working directory
>>> os.getcwd()
'/home/learnbyexample/Python/programs/'
# value of an environment variable
>>> os.getenv('SHELL')
'/bin/bash'
# file size
>>> os.stat('ip.txt').st_size
40
# check if given path is a file
>>> os.path.isfile('ip.txt')
True
Quoting from docs.python: glob:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but
*
,?
, and character ranges expressed with[]
will be correctly matched.
>>> import glob
# list of files (including directories) containing '_file' in their name
>>> glob.glob('*_file*')
['read_file.py', 'write_file.py']
Quoting from docs.python: shutil:
The
shutil
module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.
>>> import shutil
>>> shutil.copy('ip.txt', 'ip_file.txt')
'ip_file.txt'
>>> glob.glob('*_file*')
['read_file.py', 'ip_file.txt', 'write_file.py']
Quoting from docs.python: pathlib:
This module offers classes representing filesystem paths with semantics appropriate for different operating systems. Path classes are divided between pure paths, which provide purely computational operations without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations.
>>> from pathlib import Path
# use 'rglob' instead of 'glob' if you want to match names recursively
>>> list(Path('programs').glob('*file.py'))
[PosixPath('programs/read_file.py'), PosixPath('programs/write_file.py')]
See pathlib module: taming the file system and stackoverflow: How can I iterate over files in a given directory? for more details and examples.
There are specialized modules for structured data processing as well, for example:
Exercises
Write a program that reads a known filename
f1.txt
which contains a single column of numbers in Python syntax. Your task is to display the sum of these numbers, which is10485.14
for the given example.$ cat f1.txt 8 53 3.14 84 73e2 100 2937
Read the documentation for
glob.glob()
and write a program to list all files ending with.txt
in the current directory as well as sub-directories, recursively.