The dataset used in this example notebook is from the Nakadake Sanroku Kiln Site Center in Japan. The data set is provided by Shinoto et al. under the CC-BY-4.0 license: DOI
Working with LIDAR datasets in afwizard
This notebook will explain how Lidar datasets are treated in AFwizard by showcasing the most common use cases. If you are not yet familiar with Jupyter, check the Introduction to Python+Jupyter notebook first.
The first thing to do in a Jupyter notebook that uses AFwizard is to import the afwizard library:
[1]:
import afwizard
Loading datasets
AFwizard handles Lidar data sets in LAS/LAZ format. To load a data set, we construct a DataSet object given its filename and assign it to a variable ds:
[2]:
ds = afwizard.DataSet(filename="nkd_pcl_epsg6670.laz")
In above example, we are using the Nakadake example data that is downloaded on-demand from the Heidelberg University data repository. You can also load your own data set by providing its filename. AFwizard currently only supports datasets in LAS and LAZ format. The dataset filename is assumed to either be an absolute path, be located in the current working directory or that you first specified its location using the set_data_directory function:
[3]:
afwizard.set_data_directory("some/directory", create_dir=True)
Here, the create_dir directory specifies whether AFwizard should create non-existing directories for you.
Spatial Reference Systems
By default, AFwizard will try to determine the dataset’s metadata to determine the correct spatial reference system. If it is not specified in the metadata or if you want to force interpretation as a certain spatial reference system, you can pass its Well-known Text (WKT) representation or EPSG code to the data set:
[4]:
ds = afwizard.DataSet(filename="nkd_pcl_epsg6670.laz", spatial_reference="EPSG:6670")
Note that specifying a specific spatial reference system does not reproject the dataset, but reinterprets the given data. If you want to reproject your data, have a look at afwizard.reproject_dataset below.
Visualizing datasets
With the dataset loaded as the object ds, we have several ways of visualizing the data set directly in Jupyter. By default, a hillshade model with a configurable spatial resolution in meters is used.
AFwizard supports three different visualization methods, namly Hillshade Model, Slope Map and Hillshade Model + Slope Map. These can best be explored using an interactive user interface:
[5]:
ds.show_interactive()
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 ds.show_interactive()
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:138, in DataSet.show_interactive(self)
135 """Visualize the dataset with interactive visualization controls in Jupyter"""
136 from afwizard.apps import show_interactive
--> 138 return show_interactive(self)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/apps.py:966, in show_interactive(dataset, filtering_callback, update_classification)
964 # If dataset is not rasterized already, do it now
965 if not isinstance(dataset, DigitalSurfaceModel):
--> 966 dataset = dataset.rasterize()
968 (
969 controls,
970 form,
(...)
973 load_raster_button,
974 ) = setup_overlay_control(dataset)
976 # Get a container widget for the visualization itself
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/pytools/__init__.py:777, in memoize_on_first_arg.<locals>.wrapper(obj, *args, **kwargs)
774 except KeyError:
775 attribute_error = False
--> 777 result = function(obj, *args, **kwargs)
778 if attribute_error:
779 object.__setattr__(obj, cache_dict_name, {key: result})
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:81, in DataSet.rasterize(self, resolution, classification)
78 if resolution <= 0:
79 raise Warning("Negative Resolutions are not possible for rasterization.")
---> 81 return DigitalSurfaceModel(
82 dataset=self, resolution=resolution, classification=classification
83 )
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:258, in DigitalSurfaceModel.__init__(self, dataset, **rasterization_options)
255 from afwizard.pdal import PDALInMemoryDataSet, execute_pdal_pipeline
257 # Store a reference to the generating dataset
--> 258 self.dataset = PDALInMemoryDataSet.convert(dataset)
260 # Validate the provided options
261 schema = load_schema("rasterize.json")
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
If you already know exactly what visualization type and paramters you want, you can pass them directly to show.
[6]:
ds.show(
visualization_type="hillshade", resolution=1.0, classification="high_vegetation"
)
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 ds.show(
2 visualization_type="hillshade", resolution=1.0, classification="high_vegetation"
3 )
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:130, in DataSet.show(self, visualization_type, **kwargs)
127 rasterize_options[key] = kwargs.pop(key)
129 # Defer visualization to the rastered dataset
--> 130 return self.rasterize(**rasterize_options).show(
131 visualization_type=visualization_type, **kwargs
132 )
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/pytools/__init__.py:777, in memoize_on_first_arg.<locals>.wrapper(obj, *args, **kwargs)
774 except KeyError:
775 attribute_error = False
--> 777 result = function(obj, *args, **kwargs)
778 if attribute_error:
779 object.__setattr__(obj, cache_dict_name, {key: result})
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:81, in DataSet.rasterize(self, resolution, classification)
78 if resolution <= 0:
79 raise Warning("Negative Resolutions are not possible for rasterization.")
---> 81 return DigitalSurfaceModel(
82 dataset=self, resolution=resolution, classification=classification
83 )
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:258, in DigitalSurfaceModel.__init__(self, dataset, **rasterization_options)
255 from afwizard.pdal import PDALInMemoryDataSet, execute_pdal_pipeline
257 # Store a reference to the generating dataset
--> 258 self.dataset = PDALInMemoryDataSet.convert(dataset)
260 # Validate the provided options
261 schema = load_schema("rasterize.json")
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
The full list of options is available in the online documentation or can be accessed directly in Jupyter by using the ? operator:
[7]:
?ds.show
Restricting datasets
If your Lidar dataset is very large, handling the entire data set becomes unwieldy, especially if we want to interactively tune ground point filtering pipelines. It is therefore important to crop the dataset to a subset that we can easily work on. We do so by showing an interactive map, adding a polygon with the polygon selector tool and hitting the Finalize button:
[8]:
rds = ds.restrict()
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 rds = ds.restrict()
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:231, in DataSet.restrict(self, segmentation, segmentation_overlay)
209 """Restrict the data set to a spatial subset
210
211 This is of vital importance when working with large Lidar datasets
(...)
226
227 """
229 from afwizard.apps import apply_restriction
--> 231 return apply_restriction(self, segmentation, segmentation_overlay)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/apps.py:865, in apply_restriction(dataset, segmentation, segmentation_overlay)
859 def apply_restriction(dataset, segmentation=None, segmentation_overlay=None):
860 """The Jupyter UI to create a segmentation object from scratch.
861
862 The use of this UI will soon be described in detail.
863 """
--> 865 dataset = as_pdal(dataset)
867 def apply_restriction(seg):
868 # convert the segmentation from EPSG:4326 to the spatial reference of the dataset
869 seg = convert_segmentation(seg, dataset.spatial_reference)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/apps.py:97, in as_pdal(dataset)
95 if isinstance(dataset, DigitalSurfaceModel):
96 return as_pdal(dataset.dataset)
---> 97 return PDALInMemoryDataSet.convert(dataset)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
In the above, the restricted dataset is assigned to a new object rds. This follows a design principle of AFwizard: All objects (datasets, filter pipelines etc.) are immutable - operations that work on datasets never implicitly modify an object. Instead the, provided input (ds in the above) is left untouched, and a modified copy is returned. This results in an increased memory consumption, but makes the interactive exploration of ground point filtering with AFwizard easier to handle.
It is also possible to load a segmentation as a geojson file and overlay it ontop of the satellite image.
[9]:
segmentation_overlay = afwizard.load_segmentation(
"nkd_sgm_assigned_TF.geojson", spatial_reference="EPSG:6670"
)
[10]:
rds = ds.restrict(segmentation_overlay=segmentation_overlay)
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 rds = ds.restrict(segmentation_overlay=segmentation_overlay)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:231, in DataSet.restrict(self, segmentation, segmentation_overlay)
209 """Restrict the data set to a spatial subset
210
211 This is of vital importance when working with large Lidar datasets
(...)
226
227 """
229 from afwizard.apps import apply_restriction
--> 231 return apply_restriction(self, segmentation, segmentation_overlay)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/apps.py:865, in apply_restriction(dataset, segmentation, segmentation_overlay)
859 def apply_restriction(dataset, segmentation=None, segmentation_overlay=None):
860 """The Jupyter UI to create a segmentation object from scratch.
861
862 The use of this UI will soon be described in detail.
863 """
--> 865 dataset = as_pdal(dataset)
867 def apply_restriction(seg):
868 # convert the segmentation from EPSG:4326 to the spatial reference of the dataset
869 seg = convert_segmentation(seg, dataset.spatial_reference)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/apps.py:97, in as_pdal(dataset)
95 if isinstance(dataset, DigitalSurfaceModel):
96 return as_pdal(dataset.dataset)
---> 97 return PDALInMemoryDataSet.convert(dataset)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
Manipulating datasets
The above principle of immutability is also followed by all other functions that manipulate datasets. The most prominent such data manipulation is the application of ground point filter pipelines. It is of such importance, that it is covered in in detail in Selecting a filter pipeline and Creating filter pipelines. Other data manipulations are e.g. remove_classification which removes any existing classification data from a dataset:
[11]:
ds = afwizard.remove_classification(ds)
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[11], line 1
----> 1 ds = afwizard.remove_classification(ds)
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:425, in remove_classification(dataset)
410 """Remove the classification values from a Lidar dataset
411
412 Instead, all points will be classified as 1 (unclassified). This is useful
(...)
421 :rtype: afwizard.DataSet
422 """
423 from afwizard.pdal import PDALInMemoryDataSet, execute_pdal_pipeline
--> 425 dataset = PDALInMemoryDataSet.convert(dataset)
426 pipeline = execute_pdal_pipeline(
427 dataset=dataset,
428 config={"type": "filters.assign", "value": ["Classification = 1"]},
429 )
430 return PDALInMemoryDataSet(
431 pipeline=pipeline,
432 spatial_reference=dataset.spatial_reference,
433 )
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
Here, we have chosen to assign the manipulated dataset to the same name as the original dataset. This is not violating the principle of immutability, because we explicitly chose to do so.
Another dataset manipulation operation that was already mentioned is the reprojection into a different spatial reference system:
[12]:
reprojected = afwizard.reproject_dataset(ds, "EPSG:4326")
ERROR 1: PROJ: proj_create_from_database: Cannot find proj.db
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[12], line 1
----> 1 reprojected = afwizard.reproject_dataset(ds, "EPSG:4326")
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/dataset.py:451, in reproject_dataset(dataset, out_srs, in_srs)
437 """Standalone function to reproject a given dataset with the option of forcing an input reference system
438
439 :param out_srs:
(...)
447 :rtype: afwizard.DataSet
448 """
449 from afwizard.pdal import execute_pdal_pipeline, PDALInMemoryDataSet
--> 451 dataset = PDALInMemoryDataSet.convert(dataset)
452 if in_srs is None:
453 in_srs = dataset.spatial_reference
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:184, in PDALInMemoryDataSet.convert(cls, dataset)
181 config["override_srs"] = spatial_reference
182 config["nosrs"] = True
--> 184 pipeline = execute_pdal_pipeline(config=[config])
186 if spatial_reference is None:
187 spatial_reference = pipeline.metadata["metadata"]["readers.las"][
188 "comp_spatialreference"
189 ]
File ~/checkouts/readthedocs.org/user_builds/afwizard/conda/stable/lib/python3.11/site-packages/afwizard/pdal.py:69, in execute_pdal_pipeline(dataset, config)
66 pipeline = pdal.Pipeline(config_str, arrays=arrays)
68 # Execute the filter and suppress spurious file output
---> 69 _ = pipeline.execute()
71 # We are currently only handling situations with one output array
72 assert len(pipeline.arrays) == 1
RuntimeError: Could not import coordinate system 'EPSG:6670': PROJ: proj_create_from_database: Cannot find proj.db.
If your dataset’s metadata does not specify a spatial reference system, you need specify it additionally using the in_srs= parameter to afwizard.reproject_dataset.
Saving datasets
Once we have achieved a result that is worth storing, we can save the dataset to a LAS/LAZ file by calling its save method:
[13]:
saved = ds.save("without_classification.las", overwrite=False)
In the above, the first argument is the filename to save to (relative paths are interpreted w.r.t. the current working directory). Optionally, LAZ compression can be activated by setting compress=True. If an existing file would be overwritten, explicit permission needs to do that needs to be granted by setting overwrite=True. The return object saved is again an adaptivefiltering dataset object that represents the LAS/LAZ file on the disk.