Adaptive Filtering Wizard
Welcome to the Adaptive Filtering Wizard
Features
AFwizard is a Python package to enhance the productivity of ground point filtering workflows in archaeology and beyond. It provides a Jupyter-based environment for “human-in-the-loop” tuned, spatially heterogeneous ground point filterings. Core features:
Working with Lidar datasets directly in Jupyter notebooks
Loading/Storing of LAS/LAZ files
Visualization using hillshade models and slope maps
Applying of ground point filtering algorithms
Cropping with a map-based user interface
Accessibility of existing filtering algorithms under a unified data model:
Access to predefined filter pipeline settings
Crowd-sourced library of filter pipelines at https://github.com/ssciwr/afwizard-library/
Filter definitions can be shared with colleagues as files
Spatially heterogeneous application of filter pipelines
Assignment of filter pipeline settings to spatial subregions in map-based user interface
Command Line Interface for large scale application of filter pipelines
Documentation
The documentation of AFwizard can be found here: https://afwizard.readthedocs.io/en/latest
Prerequisites
In order to work with AFwizard, you need the following required pieces of Software.
If you want to use the respective backends, you also need to install the following pieces of software:
Installing and using
Using Conda
Having a local installation of Conda, the following sequence of commands sets up a new Conda environment and installs afwizard
into it:
conda create -n afwizard
conda activate afwizard
conda install -c conda-forge afwizard
You can start the JupyterLab frontend by doing:
conda activate afwizard
jupyter lab
If you need some example notebooks to get started, you can copy them into the current working directory like this:
conda activate afwizard
copy_afwizard_notebooks
Development Build
If you are intending to contribute to the development of the library, we recommend the following setup:
git clone https://github.com/ssciwr/afwizard.git
cd afwizard
conda env create -f environment-dev.yml --force
conda run -n afwizard-dev python -m pip install --no-deps .
Using Binder
You can try AFwizard without prior installation by using Binder, which is a free cloud-hosted service to run Jupyter notebooks. This will give you an impression of the library’s capabilities, but you will want to work on a local setup when using the library productively: On Binder, you might experience very long startup times, slow user experience and limitations to disk space and memory.
Using Docker
Having set up Docker, you can use AFwizard directly from a provided Docker image:
docker run -t -p 8888:8888 ssciwr/afwizard:latest
Having executed above command, paste the URL given on the command line into your browser and start using AFwizard by looking at the provided Jupyter notebooks. This image is limited to working with non-proprietary filtering backends (PDAL only).
Using Pip
We advise you to use Conda as AFwizard depends on a lot of other Python packages, some of which have external C/C++ dependencies. Using Conda, you get all of these installed automatically, using pip you might need to do a lot of manual work to get the same result.
That being said, afwizard
can be installed from PyPI:
python -m pip install afwizard
Citation - How to cite AFwizard
The following scientific article can be referenced when using AFwizard in your research.
Doneus, M., Höfle, B., Kempf, D., Daskalakis, G. & Shinoto, M. (2022): Human-in-the-loop development of spatially adaptive ground point filtering pipelines — An archaeological case study. Archaeological Prospection. Vol. 29 (4), pp. 503-524. DOI: https://doi.org/10.1002/arp.1873
Related Bibtex entry:
@Article{Doneus_2022,
author = {Michael Doneus and Bernhard H\"ofle and Dominic Kempf and Gwydion Daskalakis and Maria Shinoto},
title = {Human-in-the-loop development of spatially adaptive ground point filtering pipelines {\textemdash} An archaeological case study},
journal = {Archaeological Prospection},
year = {2022},
volume = {29},
number = {4},
pages = {503--524},
doi = {10.1002/arp.1873},
url = {https://doi.org/10.1002/arp.1873} }
The data from the Nakadake Sanroku Kiln Site Center in Japan used in above article is also accessible under CC-BY-SA 4.0 in the data repository of the 3D Spatial Data Processing Group:
@data{data/TJNQZG_2022,
author = {Shinoto, Maria and Doneus, Michael and Haijima, Hideyuki and Weiser, Hannah and Zahs, Vivien and Kempf, Dominic and Daskalakis, Gwydion and Höfle, Bernhard and Nakamura, Naoko},
publisher = {heiDATA},
title = {{3D Point Cloud from Nakadake Sanroku Kiln Site Center, Japan: Sample Data for the Application of Adaptive Filtering with the AFwizard}},
year = {2022},
version = {V2},
doi = {10.11588/data/TJNQZG},
url = {https://doi.org/10.11588/data/TJNQZG}
}
Troubleshooting
If you run into problems using AFwizard, we kindly ask you to do the following in this order:
Have a look at the list of our Frequently Asked Questions for a solution
Search through the GitHub issue tracker
Open a new issue on the GitHub issue tracker providing
The version of
afwizard
usedInformation about your OS
The output of
conda list
on your machineAs much information as possible about how to reproduce the bug
If you can share the data that produced the error, it is much appreciated.
Overview of the workflows in AFwizard
The goal of AFwizard is to provide a high productivity environment for archaelogists and other researchers working with Lidar data to produce high precision ground point filterings. To that end it implements an adaptive approach that allows human-in-the-loop optimization of ground point filtering algorithms. By adaptive we mean two things:
Parameter Adaptivity: We allow interactive, human-in-the-loop finetuning of filtering algorithm parameters. AFwizard does not implement its own filtering algorithm, but instead provides access to a variety of established backends through a common interface (PDAL, OPALS, LASTools).
Spatial Adaptivity: In order to produce high precision ground point filterings, parameters for filtering algorithms need to be spatially varied in order to choose the best algorithm for each terrain type. AFwizard manages this process in an interactive way.
AFwizard is a Python library with deep integration into the Jupyter ecosystem. Some familiarity with Python and Jupyter is useful when using it, but the necessary skills can also be developed while using the library. In that case you should start by reading our short Introduction to Python + Jupyter.
The overall procedure when working with AFwizard is described in the following. This is at the same time an outline of the rest of this documentation. Users are expected to have their own point cloud datasets acquired by airborne laser scanning in LAS/LAZ format.
As Lidar datasets are typically be very large, the first important step is to trim down the dataset to a region of interest which is expecting to be suitable for filtering with one parameter set (because it e.g. contains the same terrain type) and which is of a size that allows a truely interactive process (we recommend e.g. 500k points). The handling and spatial restriction of datasets is described in Working with datasets.
Given such a dataset sample for a terrain type, the next step would be to choose and customize a suitable filter pipeline from a number of available filter libraries. The concept of filter libraries is explained in Working with filter libraries. If you are new to AFwizard, you can leverage the existing crowd-sourced filter pipelines provided with AFwizard. The process of selecting and customizing the best filter pipeline for your particular dataset sample is described in Selecting a filter pipeline for a dataset.
If none of the provided filter pipelines matches your needs or you want to tune the process even more, you can read about Creating filter pipelines yourself. You will learn about the interactive process to define your own pipelines that combine the strengths of the available filtering backends. If you found a filter pipeline that works good for you, consider adding the required metadata and contribute it to the library of crowd-sourced filters!
Once you know which filter pipelines you want to apply in which subregion of your dataset, you should look into Mapping filter pipelines to segmentations, which will explain you how to add filter pipeline information to a GeoJSON file such that we can later generate a digital elevation model for the entire dataset. You are expected to have created the segmentation in e.g. a GIS and exported it as a GeoJSON file.
In a final step, the digital elevation model is generated. This is done from the command line instead of through Jupyter, because you might want to consider doing this on a bigger machine. The details about that are described in Executing adaptive filter pipelines.
The dataset used in this example notebook is from the Nakadake Sanroku Kiln Site Center in Japan. The data set is provided by Shinoto et al. under the CC-BY-4.0 license: DOI
Working with LIDAR datasets in afwizard
This notebook will explain how Lidar datasets are treated in AFwizard by showcasing the most common use cases. If you are not yet familiar with Jupyter, check the Introduction to Python+Jupyter notebook first.
The first thing to do in a Jupyter notebook that uses AFwizard is to import the afwizard
library:
[1]:
import afwizard
Loading datasets
AFwizard handles Lidar data sets in LAS/LAZ format. To load a data set, we construct a DataSet
object given its filename and assign it to a variable ds
:
[2]:
ds = afwizard.DataSet(filename="nkd_pcl_epsg6670.laz")
In above example, we are using the Nakadake example data that is downloaded on-demand from the Heidelberg University data repository. You can also load your own data set by providing its filename. AFwizard currently only supports datasets in LAS and LAZ format. The dataset filename is assumed to either be an absolute path, be located in the current working directory or that you first specified its location using the set_data_directory
function:
[3]:
afwizard.set_data_directory("some/directory", create_dir=True)
Here, the create_dir
directory specifies whether AFwizard should create non-existing directories for you.
Spatial Reference Systems
By default, AFwizard will try to determine the dataset’s metadata to determine the correct spatial reference system. If it is not specified in the metadata or if you want to force interpretation as a certain spatial reference system, you can pass its Well-known Text (WKT) representation or EPSG code to the data set:
[4]:
ds = afwizard.DataSet(filename="nkd_pcl_epsg6670.laz", spatial_reference="EPSG:6670")
Note that specifying a specific spatial reference system does not reproject the dataset, but reinterprets the given data. If you want to reproject your data, have a look at afwizard.reproject_dataset
below.
Visualizing datasets
With the dataset loaded as the object ds
, we have several ways of visualizing the data set directly in Jupyter. By default, a hillshade model with a configurable spatial resolution in meters is used.
AFwizard supports three different visualization methods, namly Hillshade Model
, Slope Map
and Hillshade Model + Slope Map
. These can best be explored using an interactive user interface:
[5]:
ds.show_interactive()
/home/docs/checkouts/readthedocs.org/user_builds/afwizard/conda/latest/lib/python3.11/site-packages/osgeo/gdal.py:287: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
[5]:
If you already know exactly what visualization type and paramters you want, you can pass them directly to show
.
[6]:
ds.show(
visualization_type="hillshade", resolution=1.0, classification="high_vegetation"
)
[6]:
The full list of options is available in the online documentation or can be accessed directly in Jupyter by using the ?
operator:
[7]:
?ds.show
Restricting datasets
If your Lidar dataset is very large, handling the entire data set becomes unwieldy, especially if we want to interactively tune ground point filtering pipelines. It is therefore important to crop the dataset to a subset that we can easily work on. We do so by showing an interactive map, adding a polygon with the polygon selector tool and hitting the Finalize button:
[8]:
rds = ds.restrict()
In the above, the restricted dataset is assigned to a new object rds
. This follows a design principle of AFwizard: All objects (datasets, filter pipelines etc.) are immutable - operations that work on datasets never implicitly modify an object. Instead the, provided input (ds
in the above) is left untouched, and a modified copy is returned. This results in an increased memory consumption, but makes the interactive exploration of ground point filtering with AFwizard easier to handle.
It is also possible to load a segmentation as a geojson file and overlay it ontop of the satellite image.
[9]:
segmentation_overlay = afwizard.load_segmentation(
"nkd_sgm_assigned_TF.geojson", spatial_reference="EPSG:6670"
)
[10]:
rds = ds.restrict(segmentation_overlay=segmentation_overlay)
Manipulating datasets
The above principle of immutability is also followed by all other functions that manipulate datasets. The most prominent such data manipulation is the application of ground point filter pipelines. It is of such importance, that it is covered in in detail in Selecting a filter pipeline and Creating filter pipelines. Other data manipulations are e.g. remove_classification
which removes any existing classification data from a dataset:
[11]:
ds = afwizard.remove_classification(ds)
Here, we have chosen to assign the manipulated dataset to the same name as the original dataset. This is not violating the principle of immutability, because we explicitly chose to do so.
Another dataset manipulation operation that was already mentioned is the reprojection into a different spatial reference system:
[12]:
reprojected = afwizard.reproject_dataset(ds, "EPSG:4326")
If your dataset’s metadata does not specify a spatial reference system, you need specify it additionally using the in_srs=
parameter to afwizard.reproject_dataset
.
Saving datasets
Once we have achieved a result that is worth storing, we can save the dataset to a LAS/LAZ file by calling its save
method:
[13]:
saved = ds.save("without_classification.las", overwrite=False)
---------------------------------------------------------------------------
AFwizardError Traceback (most recent call last)
Cell In[13], line 1
----> 1 saved = ds.save("without_classification.las", overwrite=False)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/latest/lib/python3.11/site-packages/afwizard/pdal.py:208, in PDALInMemoryDataSet.save(self, filename, overwrite)
205 compress = "laszip"
207 if not overwrite and os.path.exists(filename):
--> 208 raise AFwizardError(
209 f"Would overwrite file '{filename}'. Set overwrite=True to proceed"
210 )
212 # Exectute writer pipeline
213 execute_pdal_pipeline(
214 dataset=self,
215 config={
(...)
220 },
221 )
AFwizardError: Would overwrite file 'without_classification.las'. Set overwrite=True to proceed
In the above, the first argument is the filename to save to (relative paths are interpreted w.r.t. the current working directory). Optionally, LAZ compression can be activated by setting compress=True
. If an existing file would be overwritten, explicit permission needs to do that needs to be granted by setting overwrite=True
. The return object saved
is again an adaptivefiltering
dataset object that represents the LAS/LAZ file on the disk.
Working with filter libraries
[1]:
import afwizard
AFWizard stores filter pipelines in .json
files for later reuse. JSON stands for JavaScript Object Notation and is a widely used format for storing custom data structures. We will see in Creating filter pipelines how these files are created from scratch. Now, we will learn how to use these files. This will enable you to leverage a library of community-contributed filter pipelines and allow you to organize your filter pipelines locally.
Adding filter libraries
Filter libraries are directories that contain a number of .json
files that contain filter pipeline definitions. AFwizard internally keeps a per-session list of known filter libraries which by default contains the current working directory and the path to the library of community-contributed filters, which has been installed as a separate Python package. We can manually register directories as filter libraries like this:
[2]:
afwizard.add_filter_library(path="/home/user/somepath", recursive=False)
The recursive
parameter specifies whether filter pipelines in subdirectories of the given directory should also be taken into account (defaults to False
).
Browsing filters in filter libraries
We can search our filter libraries for filter pipelines matching certain criteria by using the select_pipeline_from_library
and select_pipelines_from_library
functions. The former will allow you to select exactly one pipeline, where as the latter allows you to select multiple pipelines by holding CTRL
pressed while clicking on the filters. Clicking the Finalize button will make the user interface vanish, but you can also use the returned pipeline
object before that:
[3]:
pipeline = afwizard.select_pipeline_from_library()
In the left most column, we see filtering criterions we can use to access the filter pipelines in our filtering libraries. The middle column contains a list of filter pipelines that match the given criteria. Clicking one of these will select it and show its metadata in the third column. The returned pipeline
object can then be passed to other functions from AFwizard, e.g. to select the best filter pipeline for a given dataset.
Proprietary filtering backends
AFwizard does not implement its own ground point filtering algorithms. Instead, algorithms from existing packages are accessible through a common interface. Currently, the following backends are available:
PDAL: The Point Data Abstraction Library is an open source library for point cloud processing.
OPALS is a proprietary library for processing Lidar data. It can be tested freely for datasets <1M points.
LASTools has a proprietary tool called
lasground_new
that can be used for ground point filtering.
PDAL is always available when using AFwizard and is used internally for many tasks that are not directly related to ground point filtering. In order to enable the OPALS backend, AFwizard needs to be given the information where your OPALS installation (potentially including your license key) is located. This can either be done by setting the environment variable OPALS_DIR
or by setting the path at runtime:
[4]:
?afwizard.set_opals_directory
Similarly, you can set the path to your installation of LASTools either through the environment variable LASTOOLS_DIR
or at runtime:
[5]:
?afwizard.set_lastools_directory
Please note that LASTools only ships Windows binaries. Therefore, you will need Wine installed on your system to successfully use the LASTools backend on Linux.
Creating new filter libraries
As filter libraries are just directories on the file system, you are free to organizing them yourself by moving and copying files and then making the respective directories known to AFwizard using add_filter_library
as seen above. However, there is also additional functionality available that eases the management process. If you are editing filter pipelines a lot, you might want to set the current library path like this:
[6]:
afwizard.set_current_filter_library(
"mylibrary", create_dirs=True, name="My testing library"
)
The library defined by set_current_filter_library
will be used as the default path to store filter pipelines using save_filter or select_best_pipeline. Here, the name
parameter is used to define a display name for the filter library. It is stored in JSON file called library.json
in the directory. Such file could also be added manually to filter library directories. If you do not want to generate library.json
or do not want to override
an existing one, you may set name=None
.
Sharing filter pipelines with others
As each filter pipeline is stored as a separate file, filter pipelines can be shared easily by sharing these files via your favorite method. If you want to share your filter pipeline with a wider community, you should consider contributing it to AFwizard’s community contribution library. This will make your filter pipeline accessible to all users of AFwizard. You can find the details of the process on GitHub.
Resetting filter libraries
If, for some reason, you want to reset the currently registered filter libraries to the default ones, you can do so like this:
[7]:
afwizard.reset_filter_libraries()
The dataset used in this example notebook is from the Nakadake Sanroku Kiln Site Center in Japan. The data set is provided by Shinoto et al. under the CC-BY-4.0 license: DOI
Creating filter pipelines
[1]:
import afwizard
This Jupyter notebook explains the workflow of creating a ground point filtering pipeline from scratch. This is an advanced workflow for users that want to define their own filtering workflows. For basic use, try choosing a pre-configured, community-contributed pipeline as described in the notebook on selecting filter pipelines.
For all of below examples, we need to load at least one data set which we will use to interactively preview our filter settings. Note that for a good interactive experience with no downtimes, you should restrict your datasets to a reasonable size (see the Working with datasets notebook for how to do it). Loading multiple datasets might be beneficial to avoid overfitting the filtering pipeline to one given dataset.
[2]:
dataset = afwizard.DataSet(
filename="nkd_pcl_epsg6670.laz", spatial_reference="EPSG:6670"
)
Creating from scratch
The main pipeline configuration is done by calling the pipeline_tuning
function with your dataset as the parameter. This will open the interactive user interface which allows you to tune the filter pipeline itself in the left column and the visualization and rasterization options in the right column. Whenever you hit the Preview button, a new tab will be added to the center column. Switching between these tabs allows you to switch between different version of your filter. The return object
pipeline
is updated on the fly until you hit the Finalize button to freeze the currently displayed filter.
[3]:
pipeline = afwizard.pipeline_tuning(dataset)
/home/docs/checkouts/readthedocs.org/user_builds/afwizard/conda/latest/lib/python3.11/site-packages/osgeo/gdal.py:287: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
If you want to inspect multiple data sets in parallel while tuning a pipeline, you can do so by passing a list of datasets to the pipeline_tuning
function. Note that AFwizard does currently not parallelize the execution of filter pipeline execution which may have a negative impact on wait times while tuning with multiple parameters. A new tab in the center column will be created for each dataset when clicking Preview:
[4]:
pipeline2 = afwizard.pipeline_tuning(datasets=[dataset, dataset])
Storing and reloading filter pipelines
Pipeline objects can be stored on disk with the save_filter
function from AFwizard. The filename passed here, can either be an absolute path or a relative one. Relative paths are interpreted w.r.t. the current working directory unless a current filter library has been declared with set_current_filter_library
:
[5]:
afwizard.save_filter(pipeline, "myfilter.json")
WARNING: This filter has insufficient metadata. Please consider adding in af.pipeline_tuning!
The appropriate counterpart is load_filter
, which restores the pipeline object from a file. Relative paths are interpreted w.r.t. to the filter libraries known to AFwizard:
[6]:
old_pipeline = afwizard.load_filter("myfilter.json")
A filter pipeline loaded from a file can be edited using the pipeline_tuning
command by passing it to the function. As always, the pipeline object returned by pipeline_tuning
will be a new object - no implicit changes of the loaded pipeline object will occur:
[7]:
edited_pipeline = afwizard.pipeline_tuning(dataset, pipeline=old_pipeline)
Batch processing in filter creation
The pipeline_tuning
user interface has some additional powerful features that allow you to very quickly explore parameter ranges for filter. You can use this feature by clicking the symbol next to a parameter. This will open a flyout where you can specify a range of parameters to generate previews for. Ranges can either be a discrete comma separated list e.g. 1, 2, 3
, a range of parameters like 4:6
or a mixture there of. Ranges are only available for numeric inputs and can be
provided an optional increment after a second colon like e.g. 1:5:2
. In the absence of an explicit increment, integer ranges use an increment of 1
and float ranges sample the range with a total of 5
samples points. When clicking Preview, batch processing information is resolved and the batch information is discarded.
Filter pipelines with end user configuration
The goal in creation of filter pipelines in AFwizard is to provide pipelines that are on the one hand specialized to a given terrain type and on the other hand generalize well to other datasets of similar terrain. In order to achieve this it is sometimes necessary to define some configuration values that are meant to be finetuned by the end user. We can do by clicking the symbol next to a parameter. Like in batch processing, a flyout opens where we can enter values, a display name for the
parameter and a description. Values can either be a comma-separated list of values or a single range of parameters with a :
. These parameters are displayed to the end user when selecting a fitting filter pipeline as described in Selecting a filter pipeline for a dataset. This end user configuration interface can also be manually invoked by using the filter pipeline’s execute_interactive
method:
[8]:
tuned = pipeline.execute_interactive(dataset)
Applying filter pipelines to data
Pipeline objects can also be used to manipulate data sets by applying the ground point filtering algorithms in a non-interactive fashion. This is one of the core tasks of the afwizard
library, but this will rarely be done in this manual fashion, as we will provide additional interfaces for (locally adaptive) application of filter pipelines:
[9]:
filtered = pipeline.execute(dataset)
The returned object is a dataset object in itself that can again be treated like described in Working with datasets:
[10]:
filtered.show_interactive()
[10]:
The dataset used in this example notebook is from the Nakadake Sanroku Kiln Site Center in Japan. The data set is provided by Shinoto et al. under the CC-BY-4.0 license: DOI
Selecting a filter pipeline for a dataset
[1]:
import afwizard
The goal of this notebook is to find the best fitting filter pipeline for a given dataset (or rather a suitably copped subset there of). If you want to manually create the filter pipeline from scratch, you should read the notebook on pipeline creation instead. Here, we assume that we want to leverage existing, community-contributed filter pipelines. In a first step, we select a number of filter pipeline candidates from filter libraries. Details about how to register
additional filter libraries, can be found in the notebook on filter libraries. We use the select_pipelines_from_library
function, which allows us to select any number of filter by keeping CTRL
pressed while clicking additional filters. In the left most column, we see filtering criterions we can use to access the filter pipelines in our filtering libraries. The middle column contains a list of filter pipelines that match the given criteria. Clicking one of these will
select it and show its metadata in the third column. Multiple filters can be selected by keeping CTRL
pressed while clicking additional filters.
[2]:
pipelines = afwizard.select_pipelines_from_library()
Next thing to do is to load a dataset as described in Working with datasets. Again, it is best to restrict the sample size to a reasonable size (e.g. by using the restrict
method on the dataset) to allow a truely interactive exploration process.
[3]:
dataset = afwizard.DataSet(
filename="nkd_pcl_epsg6670.laz", spatial_reference="EPSG:6670"
)
We can then create a comparison of ground point filtering results with the goal of choosing the most fitting pipeline. Each tab shows the filtering results of one of the selected pipelines. In the left column, we can fine tune visualization and rasterization options while on the right hand side, we can fine tune the filter. The configuration options shown here have been introduced by the author of the filter pipeline for you to fine tune the results:
[4]:
best = afwizard.select_best_pipeline(dataset, pipelines=pipelines)
The newly create filter pipeline object best
has the end user configuration specified in the right column baked into the filter. This means that the resulting filter is to some extent a dataset-specific specialization of the general purpose filter pipeline. To distinguish such specialized filter from more general ones, it is useful to save it into a separate filtering library as outlined in the notebook on filtering libraries, e.g. by using
set_current_filter_library
to set up a filtering library for your current project before saving any filters:
[5]:
afwizard.set_current_filter_library(
"projectx", create_dirs=True, name="Filters for projext X"
)
[6]:
afwizard.save_filter(best, "bestfilter.json")
WARNING: This filter has insufficient metadata. Please consider adding in af.pipeline_tuning!
The dataset used in this example notebook is from the Nakadake Sanroku Kiln Site Center in Japan. The data set is provided by Shinoto et al. under the CC-BY-4.0 license: DOI
Mapping segmentations to filter pipelines
When talking about adaptive ground point filtering in AFwizard we have two types of adaptivity in mind: Parameter adaptivity and spatial adaptivity. This notebook describes the details of how spatial adaptivity is implemented in AFwizard. It assumes that you have already created a suitable segmentation of your dataset into spatial features (e.g. in a GIS). We will then see how we can attach filter pipeline information to that segmentation file.
[1]:
import afwizard
We again work on our demonstrator dataset:
[2]:
ds = afwizard.DataSet(filename="nkd_pcl_epsg6670.laz", spatial_reference="EPSG:6670")
Next, we import the segmentation the GeoJSON file. It is assumed to contain a FeatureCollection
in the sense of the GeoJSON standard where each features combines the geometric information of the segment (Polygon
or Multipolygon
) and a number of properties. One of these properties should contain your custom classification of the segments into classes.
[3]:
segmentation = afwizard.load_segmentation(
"nkd_sgm_assigned_TF.geojson", spatial_reference="EPSG:6670"
)
As we are trying to map features to filter pipelines, we also need to load some filter pipelines. Here, we are directly opening these using load_filter
. In practice, these should be selected from available filter libraries using e.g. the tools described in Working with filter libraries.
[4]:
pipelines = [
afwizard.load_filter("nkd_fpl_paddy_LT.json"),
afwizard.load_filter("nkd_fpl_slope_LT.json"),
afwizard.load_filter("nkd_fpl_valley_TF.json"),
]
The core task of assigning filter piplines is done by the assign_pipeline
function which allows us to interactively set filter pipelines. On the right hand side, we can choose which property of our GeoJSON file contains the classification information. A feature can be highlighted on the map by clicking the button. For each class of segments, a pipeline can be selected from the dropdown menu:
[5]:
assigned_segmentation = afwizard.assign_pipeline(
ds, segmentation=segmentation, pipelines=pipelines
)
Once the pipelines are assigned, the returned segmentation object has a new property called “pipeline” that will direct the adaptivefiltering command line interface to the corresponding filter pipeline file. The modified file can be saved to disk by using the save
method:
[6]:
assigned_segmentation.save("assigned_segmentation.geojson")
It is worth noting that afwizard
does not store the entire filter configuration in the GeoJSON object. This is done to allow further refinement of the used filter pipelines after the segmentation is created. Also, it does not store absolute or relative paths to your filter pipelines, because these could always change when moving to new hardware or when reorganizing your project. Instead it stores the metadata of your filter (in hashed form) and compares it against the metadata of the filter
pipelines in your currently loaded filter libraries. If metadata is ambiguous across the given filter libraries, an error will be thrown.
[ ]:
Executing adaptive filter pipelines
Once you have completed the work of creating a segmentation for your dataset and choosing the appropriate filter settings for your terrain type, you might want to apply your filter to the entire dataset. This step can be done in two ways: Either through the Python API or more conveniently through a command line interface.
afwizard
Command Line Interface for AFwizard
This CLI is used once you have finished the interactive exploration work with the AFwizard Jupyter UI. The CLI takes your dataset and the segmentation file created in Jupyter and executes the ground point filtering on the entire dataset.
afwizard [OPTIONS]
Options
- --dataset <dataset>
Required The LAS/LAZ data file to work on.
- --dataset-crs <dataset_crs>
Required The CRS of the data
- --segmentation <segmentation>
Required The GeoJSON file that describes the segmentation of the dataset. This is expected to be generated either by the Jupyter UI or otherwise provide the necessary information about what filter pipelines to apply.
- --segmentation-crs <segmentation_crs>
Required The CRS used in the segmentation
- --library <library>
A filter library location that AFwizard should take into account. Can be given multiple times.
- --output-dir <output_dir>
The directory to place output files (both LAS/LAZ and GeoTiff).
- Default:
output
- --resolution <FLOAT>
The meshing resolution to use for generating GeoTiff files
- Default:
0.5
- --compress
Whether LAZ files should be written instead of LAS.
- --suffix <suffix>
The suffix to add to filtered datasets.
- Default:
filtered
- --opals-dir <opals_dir>
The directory where to find an OPALS installation
- --lastools-dir <lastools_dir>
The directory where to find a LASTools installation
Python API
- afwizard.execute.apply_adaptive_pipeline(dataset=None, segmentation=None, pipelines=None, output_dir='output', resolution=0.5, compress=False, suffix='filtered')
Python API to apply a fully configured adaptive pipeline
This function implements the large scale application of a spatially adaptive filter pipeline to a potentially huge dataset. This can either be used from Python or through AFwizard’s command line interface.
- Parameters:
datasets (list) – One or more datasets of type ~afwizard.dataset.DataSet.
segmentation (afwizard.segmentation.Segmentation) – The segmentation that provides the geometric information about the spatial segmentation of the dataset and what filter pipelines to apply in which segments.
output_dir (str) – The output directory to place the generated output in. Defaults to a subdirectory ‘output’ within the current working directory/
resolution (float) – The resolution in meters to use when generating GeoTiff files.
compress (bool) – Whether to write LAZ files instead of LAS>
suffix (str) – A suffix to use for files after applying filtering
Introduction to Python + Jupyter
This notebook explains some fundamentals of the Python programming language and of Jupyter notebooks. It is intended for audiences without programming backgrounds that still would like to work with the AFwizard library. It is by no means a good introduction into programming with Python or Jupyter itself, as it only explains those parts necessary to work with AFwizard.
Python - an interpreted language
Python is an interpreted language: A program - the Python interpreter takes a piece of source code and executes (or interprets) it line by line. This is different from compiled languages, where the source code is translated into an executable program before running. In a normal setting, one would often invoke the Python interpreter manually with a .py
file to interpret that Python file’s content.
What is a Jupyter notebook?
Jupyter Notebooks are a way of organizing Python code such that it can be easily combined with documentation, visualization and user interface controls. Jupyter notebooks are stored as .ipynb
files and they are displayed in a web browser using an interactive frontend such as JupyterLab. You are currently looking at a Jupyter Notebook.
How to use a Jupyter notebook?
Jupyter notebooks are organized as cells. So far, all cells contained documentation (in the Markdown format) that was automatically rendered in the web frontend. The following cell, however is our first code cell:
[1]:
print("Hello World")
Hello World
In order to execute the Python code in the cell, we click the cell (so that it becomes the currently active cell) and press Shift+Enter
. Try it with above cell! When doing so, Jupyter will send the code in the cell to the Python interpreter, which will execute it and send any results back to Jupyter which may then display that information to us in the web interface. We can repeat this process with other code cells. Note that we do not necessarily have to execute code cells in a Jupyter
notebook top to bottom - although it makes sense most of the time.
Importing Python libraries
Most of the time, developers do not write Python code from scratch, but build on existing Python libraries. Libraries are Python code that is bundled for reuse in other projects. Libraries are either released as part of the Python standard library or provided by third parties e.g. through the Python Package Index PyPI. If installed, a library is imported into a Python project using the import
statement. We will typically
import the adaptivefiltering
library at the beginning of our notebooks:
[2]:
import afwizard
Having done so allows us to access any functions or classes that library provides, like e.g. print the version of the adaptivefiltering
library that we are using:
[3]:
afwizard.print_version()
1.0.0
Python variables, functions and classes
TODO: Continue
Troubleshooting
The above should give you enough knowledge to continue by looking at the other notebooks to learn about AFwizard’s capabilities. If you experience problems, here is a few things that might be helpful:
Try restarting the kernel (e.g. using the Kernel menu tab) and rerun your notebook
Check the API documentation of ``afwizard` <https://afwizard.readthedocs.io/en/latest/index.html#document-user_API>`__ or the demonstrator notebooks if you are providing the correct arguments to your function calls.
Open an issue at our GitHub issue tracker
Extending AFwizard with custom backends
In this section, we will describe how the afwizard
data model
can be extended with custom backends. Such extensions can be done from your project
that depends on afwizard
- you do not necessarily need to contribute
your custom backend to afwizard
for it to integrate with the rest of
afwizard
.
In this documentation we will treat the following use case: You do have an
executable myfilter
that performs ground point filtering. It accepts
an LAS input filename, an output filename and floating point finetuning value
as command line arguments. You want to expose this filtering backend in
afwizard
.
Note
This is an advanced topic. A certain familiarity with object-oriented Python programming is required to understand and productively use this feature.
The filter backend class
Custom backends are created by inheriting from the afwizard.filter.Filter
class. When inheriting, your derived class needs to specify an identifier
that
will be used to register your derived class with the base class. Having done that you only
need to implement two methods on the derived class: schema
describes the
configuration space of your custom backend and execute
contains the execution
logic of your backend:
import afwizard
import shutil
import subprocess
class MyBackend(afwizard.filter.Filter, identifier="mybackend"):
@classmethod
def schema(cls):
# The configuration schema here follows the JSONSchema standard.
return {
"anyOf": [
{
"type": "object",
"title": "My Filtering Backend",
"properties": {
"_backend": {
"type": "string",
"const": "mybackend",
},
"myparameter": {
"type": "number",
"default": 0.5,
"title": "My tuning parameter",
}
},
}
]
}
def execute(self, dataset):
# Ensure that the dataset is of type DataSet (maybe applying conversion)
dataset = afwizard.DataSet.convert(dataset)
# Create a temporary filename for the output
filename = afwizard.paths.get_temporary_filename("las")
# Run the filter program as a subprocess
subprocess.run(
["myfilter", dataset.filename, filename, self.config["myparameter"]],
check=True,
)
# Construct a new DataSet object with the result
return afwizard.DataSet(filename, spatial_reference=dataset.spatial_reference)
@classmethod
def enabled(cls):
# We only enable this backend if the executable 'myfilter' is in our path
return shutil.which("myfilter") is not None
The implementation of schema
needs to return a dictionary that follows the
JSONSchema specification. If you do not know JSONSchema, you might want to read this
introduction guide: Understanding JSONSchema. We require the schema for your filter
to be wrapped into an anyOf
rule that allows schema composition between backends.
This anyOf
rule does also allow you to expose multiple filters per backend class
(e.g. because they share the same execution logic). Each of the schemas contained in
the anyOf
rule must be of type object
and define at least the _backend
property as shown in the code example.
The execute
method implements the core functionality of your filter. It is passed
a dataset and returns a filtered dataset. We first assert that we are dealing with a dataset
that is represented by a LAS file by converting it to afwizard.DataSet
.
The actual execution is done using subprocess.run
.
The enabled
method in the above can be used to exclude the custom backend if
some condition is not met e.g. the necessary executable was not found. This methods defaults
to True
.
Using a custom backend class
As backend classes register themselves with the base class, it is only necessary to ensure
that the module that contains the class has been imported before other functionality of
afwizard
is used. This can e.g. be done from __init__.py
.
Backends that operate on custom data representations
In above example, the ground point filtering algorithm operated directly on LAS files
from the file system. Other backends might operate on other data representations, e.g.
OPALS is working with its own OPALS Data Manager object. If your backend should work
on a different representation, you can inherit from afwizard.DataSet
and implement the following
methods which are shown as no-op here:
class CustomDataSet(afwizard.DataSet):
@classmethod
def convert(cls, dataset):
# Make sure that conversion is idempotent
if isinstance(dataset, CustomDataSet):
return dataset
# Here, you can do custom things
return CustomDataSet(dataset.filename, dataset.spatial_reference)
def save(self, filename, overwrite=False):
# Save the dataset as LAS - using DataSet here
return DataSet.convert(self).save(filename, overwrite=overwrite)
The convert
method will be used by filters to ensure the correct
dataset representation as shown in above example.
User API
- class afwizard.DataSet(filename=None, spatial_reference=None)
The main class that represents a Lidar data set.
The DataSet class performs lazy loading - instantiating an object of this type does not trigger memory intense operations until you do something with the dataset that requires such operation.
- Parameters:
filename (str) – Filename to load the dataset from. The dataset is expected to be in LAS/LAZ 1.2-1.4 format. If an absolute filename is given, the dataset is loaded from that location. Relative paths are interpreted (in this order) with respect to the directory set with
set_data_directory()
, the current working directory, XDG data directories (Unix only) and the Python package installation directory.spatial_reference (str) – A spatial reference as WKT or EPSG code. This will override the reference system found in the metadata and is required if no reference system is present in the metadata of the LAS/LAZ file. If this parameter is not provided, this information is extracted from the metadata.
- classmethod convert(dataset)
Convert this dataset to an instance of DataSet
This is used internally to convert datasets between different representations.
- Returns:
A dataset with transformed datapoints.
- Return type:
- rasterize(resolution=0.5, classification=None)
Create a digital terrain model from the dataset
It is important to note that for archaeologic applications, the mesh is not a traditional DEM/DTM (Digitial Elevation/Terrain Model), but rather a DFM (Digital Feature Model) which consists of ground and all potentially relevant structures like buildings etc. but always excludes vegetation.
- Parameters:
resolution (float) – The mesh resolution in meters. Adapt this depending on the scale of the features you are looking for and the point density of your Lidar data.
classification (tuple) – The classification values to include into the written mesh file.
- restrict(segmentation=None, segmentation_overlay=None)
Restrict the data set to a spatial subset
This is of vital importance when working with large Lidar datasets in AFwizard. The interactive exploration process for filtering pipelines requires a reasonably sized subset to allow fast previews.
- Parameters:
segmentation – A segmentation object that provides the geometric information for the cropping. If omitted, an interactive selection tool is shown in Jupyter.
segmentation_overlay – A segmentation object that will be overlayed on the map for easier use of the restrict app.
- Type:
- Type:
- save(filename, overwrite=False)
Store the dataset as a new LAS/LAZ file
This method writes the Lidar dataset represented by this data structure to an LAS/LAZ file. This includes the classification values which may have been overriden by a filter pipeline.
- Parameters:
filename (str) – Where to store the new LAS/LAZ file. You can either specify an absolute path or a relative path. Relative paths are interpreted w.r.t. the current working directory.
overwrite (bool) – If this parameter is false and the specified filename does already exist, an error is thrown. This is done in order to prevent accidental corruption of valueable data files.
- Returns:
A dataset object wrapping the written file
- Return type:
- show(visualization_type='hillshade', **kwargs)
Visualize the dataset in JupyterLab
Several visualization options can be chosen via the visualization_type parameter. Some of the arguments given below are only available for specific visualization types. To explore the visualization capabilities, you can also use the interactive user interface with
show_interactive()
.- Parameters:
visualization_type (str) – Which visualization to use. Current implemented values are
hillshade
for a greyscale 2D map,slopemap
for a 2D map color-coded by the slope andblended_hillshade_slope
which allows to blend the former two into each other.classification (tuple) – Which classification values to include into the visualization. By default, all classes are considered. The best interface to provide this information is using
afwizard.asprs
.resolution (float) – The spatial resolution in meters.
azimuth (float) – The angle in the xy plane where the sun is from [0, 360] (
hillshade
andblended_hillshade_slope
only)angle_altitude – The angle altitude of the sun from [0, 90] (
hillshade
andblended_hillshade_slope
only)alg (str) – The hillshade algorithm to use. Can be one of
Horn
andZevenbergenThorne
. (hillshade
andblended_hillshade_slope
only)blending_factor (float) – The blending ratio used between hillshade and slope map from [0, 1]. (
blended_hillshade_slope
only)
- show_interactive()
Visualize the dataset with interactive visualization controls in Jupyter
- afwizard.add_filter_library(path=None, package=None, recursive=False, name=None)
Add a custom filter library to this session
Adaptivefiltering keeps a list of filter libraries that it browses for filter pipeline definitions. This function adds a new directory to that list. You can use this to organize filter files on your hard disk.
- Parameters:
path (str) – The filesystem path where the filter library is located. The filter library is a directory containing a number of filter files and potentially a library.json file containing metadata.
package (str) – Alternatively, you can specify a Python package that is installed on the system and that contains the relevant JSON files. This is used for afwizards library of community-contributed filter pipelines.
recursive (bool) – Whether the file system should be traversed recursively from the given directory to find filter pipeline definitions.
name (str) – A display name to override the name provided by library metadata
- afwizard.apply_adaptive_pipeline(dataset=None, segmentation=None, pipelines=None, output_dir='output', resolution=0.5, compress=False, suffix='filtered')
Python API to apply a fully configured adaptive pipeline
This function implements the large scale application of a spatially adaptive filter pipeline to a potentially huge dataset. This can either be used from Python or through AFwizard’s command line interface.
- Parameters:
datasets (list) – One or more datasets of type ~afwizard.dataset.DataSet.
segmentation (afwizard.segmentation.Segmentation) – The segmentation that provides the geometric information about the spatial segmentation of the dataset and what filter pipelines to apply in which segments.
output_dir (str) – The output directory to place the generated output in. Defaults to a subdirectory ‘output’ within the current working directory/
resolution (float) – The resolution in meters to use when generating GeoTiff files.
compress (bool) – Whether to write LAZ files instead of LAS>
suffix (str) – A suffix to use for files after applying filtering
- afwizard.assign_pipeline(dataset, segmentation, pipelines)
Load a segmentation object with one or more multipolygons and a list of pipelines. Each multipolygon can be assigned to one pipeline.
- Parameters:
segmentation – This segmentation object needs to have one multipolygon for every type of ground class (dense forrest, steep hill, etc..).
pipelines – All pipelines that one wants to link with the given segmentations.
- Type:
- Type:
list of afwizard.filter.Pipeline
- Returns:
A segmentation object with added pipeline information
- Return type:
- afwizard.execute_interactive(dataset, pipeline)
Interactively apply a filter pipeline to a given dataset in Jupyter
This allows you to interactively explore the effects of end user configuration values specified by the filtering pipeline.
- Parameters:
dataset (afwizard.DataSet) – The dataset to work on
pipeline – The pipeline to execute.
- Returns:
The pipeline with the end user configuration baked in
- Return type:
- afwizard.load_filter(filename)
Load a filter from a file
This function restores filters that were previously saved to disk using the
save_filter()
function.- Parameters:
filename (str) – The filename to load the filter from. Relative paths are interpreted w.r.t. the current working directory.
- afwizard.load_segmentation(filename, spatial_reference=None)
Load a GeoJSON segmentation from a file
- Parameters:
filename (str) – The filename to load the GeoJSON file from.
spatial_reference – The WKT or EPSG code of the segmentation file.
- afwizard.pipeline_tuning(datasets=[], pipeline=None)
The Jupyter UI to create a filtering pipeline from scratch.
The use of this UI is described in detail in the notebook on creating filter pipelines.
- Parameters:
datasets (list) – One or more instances of Lidar datasets to work on
pipeline (afwizard.filter.Pipeline) – A pipeline to use as a starting point. If omitted, a new pipeline object will be created.
- Returns:
Returns the created pipeline object
- Return type:
- afwizard.print_version()
Print the current version of AFwizard
- afwizard.remove_classification(dataset)
Remove the classification values from a Lidar dataset
Instead, all points will be classified as 1 (unclassified). This is useful to drop an automatic preclassification in order to create an archaelogically relevant classification from scratch.
- Parameters:
dataset (afwizard.Dataset) – The dataset to remove the classification from
- Returns:
A transformed dataset with unclassified points
- Return type:
- afwizard.reproject_dataset(dataset, out_srs, in_srs=None)
Standalone function to reproject a given dataset with the option of forcing an input reference system
- Parameters:
out_srs (str) – The desired output format in WKT.
in_srs (str) – The input format in WKT from which to convert. The default is the dataset’s current reference system.
- Returns:
A reprojected dataset
- Return type:
- afwizard.reset_filter_libraries()
Reset registered filter libraries to the default ones
The default libraries are the current working directory and the library of community-contributed filter pipelines provided by
afwizard
.
- afwizard.save_filter(filter_, filename)
Save a filter to a file
Filters saved to disk with this function can be reconstructed with the
load_filter()
method.- Parameters:
filter (Filter) – The filter object to write to disk
filename – The filename where to write the filter. Relative paths are interpreted w.r.t. the current working directory.
- afwizard.select_best_pipeline(dataset=None, pipelines=None)
Select the best pipeline for a given dataset.
The use of this UI is described in detail in the notebook on selecting filter pipelines.
- Parameters:
dataset (afwizard.DataSet) – The dataset to use for visualization of ground point filtering results
pipelines (list) – The tentative list of pipelines to try. May e.g. have been selected using the select_pipelines_from_library tool.
- Returns:
The selected pipeline with end user configuration baked in
- Return type:
- afwizard.select_pipeline_from_library(multiple=False)
The Jupyter UI to select filtering pipelines from libraries.
The use of this UI is described in detail in the notebook on filtering libraries.
- Parameters:
multiple (bool) – Whether or not it should be possible to select multiple filter pipelines.
- Returns:
Returns the selected pipeline object(s)
- Return type:
- afwizard.select_pipelines_from_library()
The Jupyter UI to select filtering pipelines from libraries.
The use of this UI is described in detail in the notebook on filtering libraries.
- Returns:
Returns the selected pipeline object(s)
- Return type:
- afwizard.set_current_filter_library(path, create_dirs=False, name='My filter library')
Set a library path that will be used to store filters in
- Parameters:
path (str) – The path to store filters in. Might be an absolute path or a relative path that will be interpreted with respect to the current working directory.
create_dirs (bool) – Whether afwizard should create this directory (and potentially some parent directories) for you
name (str) – The display name of the library (e.g. in the selection UI)
- afwizard.set_data_directory(directory, create_dir=False)
Set a custom root directory to locate data files
- Parameters:
directory (str) – The name of the custom data directory.
create_dir – Whether AFwizard should create the directory if it does not already exist.
- afwizard.set_lastools_directory(dir)
Set custom LASTools installation directory
Use this function at the beginning of your code to point AFwizard to a custom LASTools installation directory. Alternatively, you can use the environment variable
LASTOOLS_DIR
to do so.- Parameters:
dir (str) – The LASTools installation directory to use
- afwizard.set_opals_directory(dir)
Set custom OPALS installation directory
Use this function at the beginning of your code to point AFwizard to a custom OPALS installation directory. Alternatively, you can use the environment variable
OPALS_DIR
to do so.- Parameters:
dir (str) – The OPALS installation directory to use
Developer API
Frequently Asked Questions (FAQ)
My dataset produces Global encoding WKT flag not set for point format 6 - 10
This means that your dataset is not conform to the LAS 1.4 specification, which requires datasets that use the latest point formats (6 - 10) to also specify the CRS in WKT. Previous versions of PDAL processed these files nevertheless, but throwing in error is correct according to the format specification.
If you are affected by this error, you should look into fixing your dataset. This can e.g. be done using this LASTools command:
las2las -i <input> -o <output> -epsg <code>
Additionally, you might want to report bugs to tools that produce this type of non-conforming LAS files.
I changed the pipeline_title
field in my GeoJSON, but it has no effect
The pipeline_title
key is added to the GeoJSON file when using
~afwizard.assign_pipeline purely for informational purpose. AFwizard
draws all required information from hash stored in the pipeline
key.
This hash is determined from the filters metadata. This allows you to move
your filter pipelines freely without invalidating segmentation GeoJSONs (which
would be the case when storing paths) and to finetune filter pipelines after
creating the segmentation GeoJSON (which would not be possible if the GeoJSON
stored the full filter configuration).