Skip to content

feat: add Gdal process pool#1192

Open
jdroenner wants to merge 49 commits into
mainfrom
jd_1
Open

feat: add Gdal process pool#1192
jdroenner wants to merge 49 commits into
mainfrom
jd_1

Conversation

@jdroenner

Copy link
Copy Markdown
Member
  • I added an entry to CHANGELOG.md if knowledge of this change could be valuable to users.

Here is a brief summary of what I did:

I created a shared pool of gdal workers using independent processes

Comment thread geoengine/justfile Outdated
Comment thread geoengine/datatypes/src/raster/data_type.rs
@@ -1,18 +1,22 @@
mod csv;
mod gdal_source;
pub mod gdal_in;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it gdal_worker_process or something like that?

}

#[derive(Clone)]
pub struct GdalPoolWorkerInstance {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ist this a worker instance or rather a dispatcher? maybe reflect it in the name, but maybe it's ok as it is.

repository.workspace = true

[features]
default = ["gdalpool_dedup_requests"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make it a feature, I guess we need to add a matrix entry in the github ci where the feature is disabled to test this as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make it a config setting instead?


let res = match res {
Ok(t) => {
// Here we need to handle edges!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is duplicated in gdal and multi band gdal source. maybe move it inside the gdal_in module

}

/*
#[test]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it commented out?

FromPrimitive,
integer::{div_ceil, div_floor},
};
use reader_mode::{GdalReaderMode, OverviewReaderState, ReaderState};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge gdal_source::reader_mode and multi_band_gdal_source::reader_mode? there seems to be a lot of overlap.

pub mod process_common;
pub mod process_impl;
mod process_pool;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this line: what do we do about the remaining occurences where gdal is used in the main process? e.g. where gdal_open_dataset_ex is called when validating tiles.

let mut s = rustc_hash::FxHasher::default();
request.full_hash(&mut s);
let tile_key = s.finish();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the followers if the leader's http request is cancelled?

),
("CPL_VSIL_CURL_CHUNK_SIZE", "1048576"), // 1mb TODO: need to tune this!
("VSI_CACHE", "TRUE"),
("VSI_CACHE_SIZE", "67108864 "), // 64mb per worker! TODO: need to tune this!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing space

Comment on lines +81 to +82
),
("CPL_VSIL_CURL_CHUNK_SIZE", "1048576"), // 1mb TODO: need to tune this!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it works to re set the VSIL CURL options at runtime. I think they are only read once.

fn main() {
let (token, _debug_lvl) = setup();
// TODO: add a logger?
//reroute_gdal_logging();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should either

  • propagate the logging to the main process (e.g. thread that reads from process stdout and emits lines as tracing debug events), or
  • send the process traces to a opentelemetry endpoint. this requires a tokio runtime in the processes I think. I did this on my edv-mvp-pool branch if you want to look at how it could be done. But I am not sure what's best.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn init_telemetry(level: Level, otlp_endpoint: Option<&str>) -> Option<TelemetryGuard> {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants