-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of hdf/parquet loadshapes #98
Comments
The API for external memory is exposed basically through As I mentioned on sf.net, here's a link to the current header docs:
For HDF5, that's how our internal implementation handles things. We have tens of thousands of loadshapes compressed in some HDF files, with one second resolution of a full year. We partitioned them in weeks. The process is basically as follows:
For this chunked approach, we also manage the time externally, especially since we use a variable time-step and custom controls. For Parquet, we can leverage Parquet partitioned datasets. I've been using those via PyArrow and the performance has been great. I haven't used the S3 support though, only files in the local network. So, our implementation (at Unicamp) is not general enough. But, since both HDF5 and Parquet/Arrow supports mechanisms to add extra metadata, we can use that to complement any non-trivial info. @tarekelgindy, since you mentioned the resolution of 15 minutes, loading everything at once might not be a problem if you don't have too many different loadshapes in the simulation. Remember that using float32 would half the memory requirements too.
I think we can keep this here, at the moment. I can see two approaches. For the short term:
For the long term, a "LoadShape Manager" would be ideal.
Besides loadshapes, there are other reasons I want to integrate with Arrow. Well, those are my main ideas. Any thoughts? By the way, the presentation I mentioned is in this panel session: https://resourcecenter.ieee-pes.org/conferences/general-meeting/PES_CVS_GM20_0806_4668_SLD.html -- it's "free for Institutional Subscribers". |
I plan to work a bit on this Thursday and Friday. Davis just added the memory-mapped version on the official code, so it will be good to compare them. For convenience: dss-extensions/dss_capi@3c0fa5f I'll save my comments for later, when I'll have some actual performance numbers. |
Interesting! Thanks for sharing! I’m also curious about the performance benchmarks. Is there anything I can do to help? |
@kdheepak I can keep you posted. Initially I can reuse some code we use at Unicamp, so it won't be too much work. Depending on how things go, we can decide to explore the results further or not. EPRI's implementation doesn't handle the main topic of this issue (HDF and Parquet), but it's good for comparison. Besides the loadshapes, there are other aspects of HPC and distributed execution that I'd like to evaluate better in the future. We (including @tarekelgindy) could talk about that in a later date, if you're available. |
Sorry for not getting back to this sooner! If you have a pre-release version I could probably test it with a few of our larger datasets to provide feedback if that helps. Definitely happy to discuss streamlining with other HPC workflows as well! |
On the performance issues The numbers from the document Davis linked surprised me, so I tried to reproduce them. For reference:
That 9388 seemed a lot.
Why the first two lines are different?
(For brevity, I'll omit numbers for float64 files) I didn't test CSV/TXT variants of the methods since I firmly believe they shouldn't be used for large-scale circuits/simulations at all. With the changes, if it's a long simulation, it doesn't really matter which method is used for "legacy" loadshapes. Something like 9388 would of course be inadvisible compared to 3 seconds. For a final data point of interest, the time to fill with the loadshapes via the official OpenDSS COM is also very long (win32com or comtypes, both >40 min), while for DSS_Python/ODD.py results in around 7.7 s. This 7.7 is what was large enough to justify Current progress I merged (adapted/rewrote) most of the changes related to memory-mapped LS from the official code, and started porting it to Linux. I decided to do this work in the 0.10.x branch, hopefully the last major change in this branch, so I had to backport the relevant changes. So far, the main change is that I added a Next step is running some long simulations to assess and document the performance across some variations:
The results will guide the Parquet implementation. |
Thanks for the update Paulo. It's pretty interesting to see what a difference the memory mapping made in Opendss. I'll definitely be using this in future versions that require .csv or .txt file inputs. |
@tarekelgindy This is expected for this first test -- the engine is not using the loadshape data at all, only getting file handles. But I also expect that it won't affect the simulation time as a whole that much, the main advantage is reduced memory load (which is good, of course). It will be interesting to compare Windows vs. Linux too (as a whole, Linux IO is much better, more versatile). |
To add initial info on the timings for loading the circuit, using DSS C-API's
So the extra time for individual files is probably due to the high number of file handles, both from the Python side and in the DSS engine. And a reminder that these numbers include the Python overhead. What I like about I'll continue this Thursday or Friday. |
Some other numbers (all based on DSS C-API):
(Relative times to the first row) This was an older server that was free on the weekend (2x Xeon E5-2630 v4). Looks like the processors are starved for the 20-process case. I'll test on a newer machine when it's available in this next week (2x Xeon Gold 6230), as well as add numbers for some desktop machines. The "DSS (MemoryMapping=Yes)" case is probably slower than "Memory-mapped, individual files" because I left it unoptimized on purpose -- there are some trivial optimizations that could be applied, in fact I can remove its code and use the same mechanism of "Chunk per day, column-major" is better than "Chunk per day, row-major" here since the on-disk data is a dense row-major matrix for the latter, without partitions, so it's worse than thousands of files. Curiously "Chunk per day, column-major" is slightly faster on average for the 20-process case, but we can see it doesn't really matter which version is used (except loading all the files 20 times wouldn't work). Since it's also in the middle of the pack for the single process run, I'm basing the "LoadShape manager" prototype on it. |
Hi @PMeira , Just thought I'd touch base on this. Did you need any help with the integration at all? |
@tarekelgindy Just need to finalize the design. The very basic approach is easy to integrate, but a more versatile version would need more work. I'm probably overthinking, so I'll try to provide a full implementation (and test results) of the basic approach with HDF/Parquet this week so that you're able to provide some feedback. Other news:
When running single processes, all column-major approaches are noticeably better. Even considering only the run-time, they can be faster (up to 20%) than traditional approaches. That seems to extend to multiple processes for the Ryzen machine. I might add results for a Raspberry Pi 4 later for completeness, but the general observations across machines/OS have been consistent so far.
|
Hi Paulo - thanks for all the work on this! Just checking - was this on a branch that you have active at the moment? I've been doing lots of runs with opendssdirect.py where I'll read base models with no loadshapes attached and then set the kW and kVar values in my own code. I use python's multiprocessing to read parquet load files into memory in parallel, and then set the values using the opendssdirect functions, which makes it very fast. If you like I can do some time & memory comparisons of these to see how it compares. I'll be dropping some big datasets soon might be good for testing some of this work on if that helps. |
I had to leave this for a bit, but probably will be able to resume work this Friday. I think I did push most of the code but maybe not for DSS Python.
If you use PyArrow, the load performance should be very close. Setting kW and kvar for each load is not ideal though. The Python API overhead is probably a lot. It seems a new OpenDSS version will finally be released, so I can also get some their more recent changes: https://sourceforge.net/p/electricdss/code/3160/ |
Feature Request
Following up on the sourceforce discussion here:
https://sourceforge.net/p/electricdss/discussion/861976/thread/59230a2c2d/
This is regarding the support of reading loadshape information in hdf/parquet format for the dss-extensions suite. My understanding is that existing memory allocation mechanisms (such as those here https://github.com/dss-extensions/dss_capi/blob/master/src/CAPI/CAPI_LoadShapes.pas#L453) could be leveraged to stream data from hdf and parquet files and that there is already some existing code which can be backported to support this.
I'd be very supportive of any efforts to integrate this workflow into opendssdirect.py and opendssdirect.jl
Furthermore let me know if there is any interest in allowing for reading multiple loadshapes from a single hdf or parquet file. This could significantly improve the performance of any hdf/parquet reader.
Happy to move this issue to the dss_capi if it makes more sense for it to live there.
The text was updated successfully, but these errors were encountered: