BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210527T134946Z
LOCATION:Digital
DTSTART;TZID=Europe/Stockholm:20200623T214500
DTEND;TZID=Europe/Stockholm:20200623T221000
UID:isc_hpc_ISC High Performance 2020_sess341_pap110@linklings.com
SUMMARY:Solving Acoustic Boundary Integral Equations Using High Performanc
e Tile Low-Rank LU Factorization
DESCRIPTION:Research Paper\n\nSolving Acoustic Boundary Integral Equations
Using High Performance Tile Low-Rank LU Factorization\n\nAlharthi, AlOmai
ry, Akbudak, Chen, Ltaief...\n\nWe design and develop a new high performan
ce implementation of a fast direct LU-based solver using low-rank approxim
ations on massively parallel systems. The LU factorization is the first an
d most time-consuming step toward solving systems of linear equations in t
he context of analyzing acoustic scattering from large 3D objects. The mat
rix equation is obtained by discretizing the boundary integral of the exte
rior Helmholtz problem using a higher-order Nystr\"{o}m scheme. The main i
dea is to exploit the inherent data sparsity of the matrix operator by per
forming local tile-centric approximations while still capturing the most s
ignificant information. In particular, the proposed LU-based solver levera
ges the Tile Low-Rank (TLR) data compression format as implemented in the
Hierarchical Computations on Manycore Architectures (HiCMA) library to dec
rease the complexity of ``classical'' dense direct solvers from cubic to q
uadratic order. We taskify the underlying boundary integral kernels to exp
ose fine-grained computations. We then employ the dynamic runtime system S
tarPU to orchestrate the scheduling of computational tasks on shared and d
istributed-memory systems. The resulting asynchronous execution permits to
compensate for the load imbalance due to the heterogeneous ranks, while m
itigating the overhead of data motion. We assess the robustness of our TLR
LU-based solver and study the qualitative impact when using different num
erical accuracies. The new TLR LU factorization outperforms the state-of-t
he-art dense factorizations by up to an order of magnitude on various para
llel systems using large-scale 3D synthetic and real geometries.\n\nTag: P
re-Recorded
URL:https://2020.isc-program.com/presentation/?id=pap110&sess=sess341
END:VEVENT
END:VCALENDAR