22. filetools
— A collection of file utilities.¶
22.1. Classes defined in module filetools¶
- class filetools.File(filename, mode, compr=None, level=5, delete_temp=True)[source]¶
Read/write files with transparent file compression.
This class is a context manager providing transparent file compression and decompression. It is commonly used in a with statement, as follows:
with File('filename.ext','w') as f: f.write('something') f.write('something more')
This will create an uncompressed file with the specified name, write some things to the file, and close it. The file can be read back similarly:
with File('filename.ext','r') as f: for line in f: print(f)
Because
File
is a context manager, the file is automatically closed when leaving the with block.So far this doesn’t look very different from using
open()
. But when specifying a filename ending on ‘.gz’ or ‘.bz2’, the File class will be automatically compress (on writing) or decompress (on reading) the file. So your code can just stay the same as above. Just use a proper filename.- Parameters:
filename (path_like) – Path of the file to open. If the filename ends with ‘.gz’ or ‘.bz2’, transparent (de)compression will be used, with gzip or bzip2 compression algorithms respectively. For other file names, it can be forced with the compr argument.
mode (str) – File open mode: ‘r’ for read, ‘w’ for write or ‘a’ for append mode. See also the Python documentation for the
open()
builtin function. For compressed files, append mode is not yet available.compr ('gz' | 'bz2') – The compression algorithm to be used: gzip or bzip2. If not provided and the file name ends with ‘.gz’ or ‘.bz2’, compr is set automatically from the extension.
level (int (1..9)) – Compression level for gzip/bzip2. Higher values result in smaller files, but require longer compression times. The default of 5 gives already a fairly good compression ratio.
delete_temp (bool) – If True (default), the temporary files needed to do the (de)compression are deleted when the File instance is closed. This can be set to False to keep the files (mainly intended for debugging).
The File class can also be used outside a
with
statement. In that case the user has to open and close the File himself. The following are more or less equivalent with the above examples (thewith
statement is better at handling exceptions):fil = File('filename.ext','w') f = fil.open() f.write('something') f.write('something more') fil.close()
This will create an uncompressed file with the specified name, write some things to the file, and close it. The file can be read back similarly:
fil = File('filename.ext','r') f = fil.open() for line in f: print(f) fil.close()
- open()[source]¶
Open the File in the requested mode.
This can be used to open a File object outside a with statement. It returns a Python file object that can be used to read from or write to the File. It performs the following:
If no compression is used, ope the file in the requested mode.
For reading a compressed file, decompress the file to a temporary file and open the temporary file for reading.
For writing a compressed file, open a tem[porary file for writing.
See the documentation for the
File
class for an example of its use.
- close()[source]¶
Close the File.
This can be used to close the File if it was not opened using a with statement. It performs the following:
The underlying file object is closed.
If the file was opened in write or append mode and compression is requested, the file is compressed.
If a temporary file was in use and delete_temp is True, the temporary file is deleted.
See the documentation for the
File
class for an example of its use.
- class filetools.TempDir(suffix=None, prefix='pyf_', dir=None, chdir=False, keep=False)[source]¶
A temporary directory that can be used as a context manager.
This is a wrapper around Python’s tempfile.TemporaryDirectory, with the following differences:
the default value for prefix is set to
pyf_
,it has an extra attribute ‘.path’ returning the directory name as a Path,
the context manager returns a Path instead of a str,
the context wrapper can automatically change into the tempdir
the context manager automatically changes back to the original workdir
- class filetools.ChDir(dirname=None, create=True)[source]¶
A context manager to temporarily change the working directory.
The context manager changes the current working directory and guarantees to come back to the previous, even if an exception occurs.
- Parameters:
dirname (path_like | None) – The relative or absolute path name of the directory to change into. If the directory does not exist, it will be created, unless
create=False
was specified. If None, a temporary working directory will be created and used, and be deleted with all its contents on leaving the contex.create (bool) – If True(default), the directory (including missing parents) will be created if it does not exist. If False, and a path was specified for
dirname
, the directory should exist and be accessible.
- Returns:
context – A context manager object that can be used in a with statement. On entry , it changes into the specified or temporary directory, and on exit it change back to the previous working directory.
- Raises:
OSError or subclass – If the specified path can no be changed into or can not be created.
Examples
>>> olddir = os.getcwd() >>> with ChDir() as newdir: ... print(os.getcwd()==newdir, newdir!=olddir) True True >>> os.getcwd()==olddir True
- class filetools.NameSequence(template, ext='', start=0, step=1)[source]¶
A class for autogenerating sequences of names.
Sequences of names are autogenerated by combining a fixed string with a numeric part. The latter is incremented at each creation of a new name (by using the next() function or by calling the NameSequence).
- Parameters:
template (str) –
Either a template to generate the names, or an example name from which the template can be derived. If the string contains a ‘%’ character, it is considered a template and will be used as such. It must be a valid template to format a single int value. For example, a template ‘point-%d’ with a value 5 will generate a name ‘point-5’.
If the string does not contain a ‘%’ character, a template is generated as follows. The string is split in three parts (prefix, numeric, suffix), where numeric only contains digits and suffix does not contain any digits. Thus, numeric is the last numeric part in the string. Use
ext
if the variable part is not the last numeric part of names. If the string does not contain any numeric part, it is split as a file name in stem and suffix, and ‘-0’ is appended to the stem. Thus, ‘point.png’ will be treated like ‘point-0.png’. Finally, if the string is empty, it is replaced with ‘0’. To create the template, the numeric part is replaced with a ‘%0#d’ format (where # is the length of the numeric part, concatened again with prefix and suffix, andext
is appended. Also, the start value is set to the numeric part (unless a nonzero start value is provided).ext (str, optional) – If provided, this is an invariable string appended to the template. It is mostly useful when providing a full name as
template
and the variable numeric part is not the last numeric part in the name. For example, NameSequence(‘x1’, ‘.5a’) will generate names ‘x1.5a’, ‘x2.5a’, …start (int, optional) – Starting value for the numerical part. If
template
contains a full name, it will only be acknowledged if nonzero.step (int, optional) – Step for incrementing the numerical value.
Notes
If N is a NameSequence, then next(N) and N() are equivalent.
Examples
>>> N = NameSequence('obj') >>> next(N) 'obj-0' >>> N() 'obj-1' >>> [N() for i in range(3)] ['obj-2', 'obj-3', 'obj-4'] >>> N.peek() 'obj-5' >>> N() 'obj-5' >>> N.template 'obj-%d' >>> N = NameSequence('obj-%03d', start=5) >>> [next(N) for i in range(3)] ['obj-005', 'obj-006', 'obj-007'] >>> N = NameSequence('obj-005') >>> [next(N) for i in range(3)] ['obj-005', 'obj-006', 'obj-007'] >>> N = NameSequence('abc.98', step=2) >>> [next(N) for i in range(3)] ['abc.98', 'abc.100', 'abc.102'] >>> N = NameSequence('abc-8x.png') >>> [next(N) for i in range(3)] ['abc-8x.png', 'abc-9x.png', 'abc-10x.png'] >>> N.template 'abc-%01dx.png' >>> N.glob() 'abc-*x.png' >>> next(NameSequence('abc','.png')) 'abc-0.png' >>> next(NameSequence('abc.png')) 'abc-0.png' >>> N = NameSequence('/home/user/abc23','5.png') >>> [next(N) for i in range(2)] ['/home/user/abc235.png', '/home/user/abc245.png'] >>> N = NameSequence('') >>> next(N), next(N) ('0', '1') >>> N = NameSequence('12') >>> next(N), next(N) ('12', '13')
22.2. Functions defined in module filetools¶
- filetools.TempFile(*args, **kargs)[source]¶
Return a temporary file that can be used as a context manager.
This is a wrapper around Python’s tempfile.NamedTemporaryFile, with the difference that the returned object has an extra attribute ‘.path’, returning the file name as a Path.
- filetools.gzip(filename, gzipped=None, remove=True, level=5, compr='gz')[source]¶
Compress a file in gzip/bzip2 format.
- Parameters:
filename (path_like) – The input file name.
gzipped (path_like, optional) – The output file name. If not specified, it will be set to the input file name + ‘.’ + compr. An existing output file will be overwritten.
remove (bool) – If True (default), the input file is removed after successful compression.
level (int 1..9) – The gzip/bzip2 compression level. Higher values result in smaller files, but require longer compression times. The default of 5 gives already a fairly good compression ratio.
compr ('gz' | 'bz2') – The compression algorithm to be used. The default is ‘gz’ for gzip compression. Setting to ‘bz2’ will use bzip2 compression.
- Returns:
Path
– The path of the compressed file.
Examples
>>> f = Path('./test_gzip.out') >>> f.write_text('This is a test\n'*100) 1500 >>> print(f.size) 1500 >>> g = gzip(f) >>> print(g) test_gzip.out.gz >>> print(g.size) 60 >>> f.exists() False >>> f = gunzip(g) >>> f.exists() True >>> print(f.read_text().split('\n')[50]) This is a test >>> g.exists() False
- filetools.gunzip(filename, unzipped=None, remove=True, compr='gz')[source]¶
Uncompress a file in gzip/bzip2 format.
- Parameters:
filename (path_like) – The compressed input file name (usually ending in ‘.gz’ or ‘.bz2’).
unzipped (path_like, optional) – The output file name. If not provided and filename ends with ‘.gz’ or ‘.bz2’, it will be set to the filename with the ‘.gz’ or ‘.bz2’ removed. If not provided and filename does not end in ‘.gz’ or ‘.bz2’, or if an empty string is provided, the name of a temporary file is generated. Since you will normally want to read something from the decompressed file, this temporary file is not deleted after closing. It is up to the user to delete it (using the returned file name) when the file has been dealt with.
remove (bool) – If True (default), the input file is removed after successful decompression. You probably want to set this to False when decompressing to a temporary file.
compr ('gz' | 'bz2') – The compression algorithm used in the input file. If not provided, it is automatically set from the extension of the filename if that is either ‘.gz’ or ‘.bz2’, or else the default ‘gz’ is used.
- Returns:
Path
– The name of the uncompressed file.
Examples
See gzip.
- filetools.zipExtract(filename, members=None)[source]¶
Extract the specified member(s) from the zip file.
The default extracts all.
- filetools.hsorted(l)[source]¶
Sort a list of strings in human order.
When human sort a list of strings, they tend to interprete the numerical fields like numbers and sort these parts numerically, instead of the lexicographic sorting by the computer.
Returns the list of strings sorted in human order.
Example: >>> hsorted([‘a1b’,’a11b’,’a1.1b’,’a2b’,’a1’]) [‘a1’, ‘a1.1b’, ‘a1b’, ‘a2b’, ‘a11b’]
- filetools.numsplit(s)[source]¶
Split a string in numerical and non-numerical parts.
Returns a series of substrings of s. The odd items do not contain any digits. The even items only contain digits. Joined together, the substrings restore the original.
The number of items is always odd: if the string ends or starts with a digit, the first or last item is an empty string.
Example:
>>> print(numsplit("aa11.22bb")) ['aa', '11', '.', '22', 'bb'] >>> print(numsplit("11.22bb")) ['', '11', '.', '22', 'bb'] >>> print(numsplit("aa11.22")) ['aa', '11', '.', '22', '']
- filetools.splitDigits(s, pos=-1)[source]¶
Split a string at a sequence of digits.
The input string is split in three parts, where the second part is a contiguous series of digits. The second argument specifies at which numerical substring the splitting is done. By default (pos=-1) this is the last one.
Returns a tuple of three strings, any of which can be empty. The second string, if non-empty is a series of digits. The first and last items are the parts of the string before and after that series. Any of the three return values can be an empty string. If the string does not contain any digits, or if the specified splitting position exceeds the number of numerical substrings, the second and third items are empty strings.
Example:
>>> splitDigits('abc123') ('abc', '123', '') >>> splitDigits('123') ('', '123', '') >>> splitDigits('abc') ('abc', '', '') >>> splitDigits('abc123def456fghi') ('abc123def', '456', 'fghi') >>> splitDigits('abc123def456fghi',0) ('abc', '123', 'def456fghi') >>> splitDigits('123-456') ('123-', '456', '') >>> splitDigits('123-456',2) ('123-456', '', '') >>> splitDigits('') ('', '', '')
- filetools.template_from_name(name, ext='')[source]¶
Return template and current number from a given name.
Return a template for generating a family names with an increasing numeric part.
- Parameters:
name (str) – The intended name format. The name is split in three parts (prefix, numeric, suffix), where numeric only contains digits and suffix does not contain any digits. Thus, numeric is the last numeric part in the name. If the name does not contain any numeric part, it is split as a file name in stem and suffix, and ‘-0’ is appended to the stem. Thus, ‘point.png’ will be treated like ‘point-0.png’. Finally, if name is an empty string, it is replaced with ‘0’.
ext (str, optional) – An extra string to be append to the returned template string. This can be used to make the variable part not the last numeric part in the name.
- Returns:
template (str) – A template that can be user to generate names like the input but with other numeric part. It is the concatenation of (prefix, ‘%0#d’, suffix, ext), where # is the length of the numeric part.
number (int) – The integer value of the numeric part or 0 if there wasn’t one.
Notes
If the input name contained a numeric part, and ext is empty, the result of template % number is the input name.
Examples
>>> t, n = template_from_name('abc-8x.png') >>> (t, n) ('abc-%01dx.png', 8) >>> t % n 'abc-8x.png' >>> template_from_name('abc-000.png') ('abc-%03d.png', 0) >>> template_from_name('abc.png') ('abc-%d.png', 0) >>> template_from_name('abc', ext='-1.png') ('abc-%d-1.png', 0) >>> template_from_name('abc') ('abc-%d', 0) >>> template_from_name('') ('%d', 0)
- filetools.autoName(clas)[source]¶
Return the autoname class instance for objects of type clas.
This allows for objects of a certain class to be automatically named throughout pyFormex.
- Parameters:
clas (str or class or object) – The object class name. If a str, it is the class name. If a class, the name is found from it. If an object, the name is taken from the object’s class. In all cases the name is converted to lower case
- Returns:
NameSequence instance – A NameSequence that will generate subsequent names corresponding with the specified class.
Examples
>>> from pyformex.formex import Formex >>> F = Formex() >>> print(next(autoName(Formex))) formex-0 >>> print(next(autoName(F))) formex-1 >>> print(next(autoName('Formex'))) formex-2
- filetools.listFonts(pattern='', include=None, exclude=None)[source]¶
List the fonts known to the system.
This uses the ‘fc-list’ command from the fontconfig package to find a list of font files installed on the user’s system. The list of files can be restricted by three parameters: a pattern to be passed to the fc-list command, an include regexp specifying which of the matching font files should be retained, and an exclude regexp specifying which files should be removed from the remaining list.
- Parameters:
pattern (str) – A pattern string to pass to the fc-list command. For example, a pattern ‘mono’ will only list monospaced fonts. Multiple elements can be combined with a colon as separator. Example: pattern=’family=DejaVuSans:style=Bold’. An empty string selects all font files.
include (str) – Regex for grep to select the font files to include in the result. If not specified, the pattern from the configuration variable ‘fonts/include’ is used. Example: the default configured include=’.ttf$’ will only return font files with a .ttf suffix. An empty string will include all files selected by the
pattern
.exclude (str) – Regex for grep to select the font files to include in the result. If not specified, the pattern from the configuration variable ‘fonts/include’ is used. Example: the default configured exclude=’Emoji’ will exclude font files that have ‘Emoji’ in their name. An empty string will exclude no files.
- Returns:
list of
Path
– A list of the font files found on the system. If fontconfig is not installed, produces a warning and returns an empty list.
Examples
>>> fonts = listFonts('mono') >>> print(len(fonts) > 0 and fonts[0].is_file()) True
- filetools.listMonoFonts()[source]¶
List the monospace fonts found on the system
This is equivalent to
listFonts('mono')
See also
- filetools.defaultMonoFont()[source]¶
Return a default monospace font for the system.
- Returns:
Path
– If the configured ‘fonts/default’ has a matching font file on the system, that Path is returned. Else, the first file fromfontList('mono')
is returned.- Raises:
ValuerError – If no monospace font was found on the system
Examples
>>> print(defaultMonoFont()) /...DejaVuSansMono.ttf
- filetools.diskSpace(path, units=None, ndigits=2)[source]¶
Returns the amount of diskspace of a file system.
- Parameters:
path (path_like) – A path name inside the file system to be probed.
units (str) – If provided, results are reported in this units. See
humanSize()
for possible values. The default is to return the number of bytes.ndigits (int) – If provided, and also
units
is provided, specifies the number of decimal digits to report. SeehumanSize()
for details.
- Returns:
total (int | float) – The total disk space of the file system containing
path
.used (int | float) – The used disk space on the file system containing
path
.available (int | float) – The available disk space on the file system containing
path
.
Notes
The sum
used + available
does not necessarily equaltotal
, because a file system may (and usually does) have reserved blocks.
- filetools.humanSize(size, units, ndigits=-1)[source]¶
Convert a number to a human size.
Large numbers are often represented in a more human readable form using k, M, G prefixes. This function returns the input size as a number with the specified prefix.
- Parameters:
size (int or float) – A number to be converted to human readable form.
units (str) – A string specifying the target units. The first character should be one of k,K,M,G,T,P,E,Z,Y. ‘k’ and ‘K’ are equivalent. A second character ‘i’ can be added to use binary (K=1024) prefixes instead of decimal (k=1000).
ndigits (int, optional) – If provided and >=0, the result will be rounded to this number of decimal digits.
- Returns:
float – The input value in the specified units and possibly rounded to
ndigits
.
Examples
>>> humanSize(1234567890,'k') 1234567.89 >>> humanSize(1234567890,'M',0) 1235.0 >>> humanSize(1234567890,'G',3) 1.235 >>> humanSize(1234567890,'Gi',3) 1.15