Riadh Ben Abdelhamid, Yoshiki Yamaguchi, Taisuke Boku, "A scalable many-core overlay architecture on an HBM2-enabled multi-die FPGA", ACM Transactions on Reconfigurable Technology and Systems, June 2022.
Yuta Hirokawa, Atsushi Yamada, Shunsuke Yamada, Masashi Noda, Mitsuharu Uemoto, Taisuke Boku, Kazuhiro Yabana, "Large-scale ab initio simulation of light–matter interaction at the atomic scale in Fugaku", International Journal of High Performance Computing Applications, Vol. 36, Issue 2, pp 182–197, Mar. 2022, https://doi.org/10.1177/10943420211065723
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe, Masayuki Umemura, "Multi-Hybrid Accelerated Simulation by GPU and FPGA on Radiative Transfer Simulation in Astrophysics", 情報処理学会論⽂誌コンピューティングシステム(ACS), Vol. 13, No.3, 17 pages, 2020年11⽉.
Masahiro Nakao, Tetsuya Odajima, Hitoshi Murai, Akihiro Tabuchi, Norihisa Fujita, Toshihiro Hanawa, Taisuke Boku, Mitsuhisa Sato, Michael Mascagni, "Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster", International Journal of High Performance Computing Applications, Vol. 33, Issue 5, pp. 869-884, Sep. 2019 https://doi.org/10.1177/1094342018821163
Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Taisuke Boku, Mitsuhisa Sato, "Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language", Int. Journal of High Performance Computing Applications, doi:10.1177/1094342017698214, 14 pages, Mar. 2017.
Keisuke Tsugane, Taisuke Boku, Hitoshi Murai, Mitsuhisa Sato, William Tang, Bei Wang, "Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP", Parallel Computing, Vol. 57, Issue C, pp. 37-51, Sep. 2016. https://doi.org/10.1016/j.parco.2016.05.016
Y. Hasegawa, J. Iwata, M. Tsuji, D. Takahashi, A. Oshiyama, K. Minami, T. Boku, H. Inoue, Y. Kitazawa, I. Miyoshi, and M. Yokokawa: Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer, International Journal of High Performance Computing Applications, August 2014, 28: pp. 335-355, Aug. 2014. https://doi.org/10.1177/1094342013508163
M. Noda, K. Ishimura, K. Nobusada, K. Yabana, T. Boku: Massively-parallel electron dynamics calculations in real-time and real-space: Toward applications to nanostructures of more than ten-nanometers in size, Journal of Computational Physics, Vol.265, pp.145-155, 2014. https://doi.org/10.1016/j.jcp.2014.02.006
J. Iwata, D. Takahashi, A. Oshiyama, T. Boku, K. Shiraishi, S. Okada and K. Yabana: A massively-parallel electronic-structure calculations based on real-space density functional theory, Journal of Computational Physics, Vol. 229, No. 6, pp. 2339-2363, 2010.
K. Nakazawa, H. Nakamura, T. Boku, I. Nakata and Y. Yamashita, "CP-PACS: A massively parallel processor at the University of Tsukuba", Parallel Computing, Vol. 25, pp.1635-1661, 1999.
A. Murata, T. Boku, and H. Amano, "The MDX (Multi-Dimensional X'bar): A Class of Networks for Large Scale Multiprocessors" IEICE Trans. on Information and Systems, Vol.E79-D, No.8, 1996.
W. G. Hoover, A. J. De Groot, C. G. Hoover, I. F. Stowers, T. Kawai, B. L. Holian, T. Boku, S. Ihara, and J. Belak, "Large-scale elastic-plastic indentation simulations via nonequilibrium molecular dynamics", Physical Review A, Vol.42, No.10, pp.5844-5853, 1990.
H. Amano, T. Boku, and T. Kudoh, "(SM)^2: A Large-Scale Multiprocessor for Sparse Matrix Calculations", IEEE Transactions on Computer, Vol.39, No.7, pp.889-905, 1990.
Kohei Kikuchi, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku, "Implementation and Performance Evaluation of Collective Communications Using CIRCUS on Multiple FPGAs", Proc. of IXPUG Workshop 2023, Workshop of HPC Asia 2023, pp. 15-23, Singapore, Feb. 2023. https://doi.org/10.1145/3581576.3581602
HPCASIAWORKSHOP 2023: Proceedings of the HPC Asia 2023 Workshops
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe and Masayuki Umemura, "GPU–FPGA-accelerated Radiative Transfer Simulation with Inter-FPGA Communication", Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 2023 (HPC Asia 2023), pp. 117-125, Singapore, Feb. 2023. https://doi.org/10.1145/3578178.3578231
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe and Masayuki Umemura, "Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC", Proc. of 23th Int. Conf. on Parallel and Distributed Computing, Applications and Technologies (PDCAT2022), Sendai, Dec. 2022.
Yutaka Watanabe, Taisuke Boku, Mitsuhisa Sato, "Design and Performance Evaluation of UCX for Tofu-D Interconnect with OpenSHMEM-UCX on Fugaku", Proc. of 5th Annual Parallel Applications Workshop, Alternatives to MPI+X (PAW-ATM2022), with SC22, Dallas, Nov. 2022.
Taisuke Boku, Norihisa Fujita, Ryohei Kobayashi, Osamu Tatebe, "Cygnus - World First Multihybrid Accelerated Cluster with GPU and FPGA Coupling", Proc. of 2nd Int. Workshop on Deployment and Use of Accelerators (DUAC2022), with ICPP2022, Bordeaux (on-line), Aug. 2022. https://dl.acm.org/doi/10.1145/3547276.3548629
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku, "Implementation and Performance Evaluation of Memory System using Addressable Cache for HPC applications on HBM2 equipped FPGAs", Proc. of 20th Int. Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar2022), with EuroPar2022, Glasgow, Aug. 2022.
Yuka Sano, Ryohei Kobayashi, Norihisa Fujita, Taisuke Boku, "Performance Evaluation on GPU-FPGA Accelerated Computing Considering Interconnections between Accelerators", Proc. Int. Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies 2022, Tsukuba-shi, Jun. 2022.
Ryuta Kashino, Ryohei Kobayashi, Norihisa Fujita, Taisuke Boku, "Multi-hetero Acceleration by GPU and FPGA for Astrophysics Simulation on Intel oneAPI Environment", Proc. of HPC Asia 2022, Kobe (on-line), Jan. 2022. https://doi.org/10.1145/3492805.3492817
Kazuki Furukawa, Tomoya Yokono, Yoshiki Yamaguchi, Kohji Yoshikawa, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku, Masayuki Umemura, " An Efficient RTL Buffering Scheme for an FPGA-Accelerated Simulation of Diffuse Radiative Transfer", Proc. of FPT'21, Auckland (virtual), Dec. 2021.
Koei Watanabe, Kohei Kikuchi, Taisuke Boku, Takuto Sato, Hiroyuki Kusaka, "High Resolution of City-Level Climate Simulation by GPU with Multi-Physical Phenomena", Proc. of NPC2021, Paris (virtual), Nov. 2021. https://doi.org/10.1007/978-3-030-93571-9_1
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku, "HBM2 Memory System for HPC Applications on an FPGA", Proc. of Int. Workshop on FPGA for HPC 2022 (in conjunction with IEEE Cluster 2021), Portland (on-line), Sep. 2021.
Naoya Umezu, Yoshiki Yamaguchi, Taisuke Boku, "An FPGA-based storage control with load balancing", Proc. of Int. Workshop on FPGA for HPC 2022 (in conjunction with IEEE Cluster 2021), Portland (on-line), Sep. 2021.
Ryohei Kobayashi, Kento Miura, Norihisa Fujita, Taisuke Boku, Toshiyuki Amagasa, "A Sorting Library for FPGA Implementation in OpenCL Programming", Proc. of HEART2021, Paderborn (on-line), Jun. 2021. https://doi.org/10.1145/3468044.3468054
Ryuta Kashino, Ryohei Kobayashi, Norihisa Fujita, Taisuke BOKU, "Performance Evaluation of OpenCL-Enabled Inter-FPGA Optical Link Communication Framework CIRCUS and SMI", Proc. of HPC Asia 2021, Jeju-Island (on-line), Jan. 2021. https://doi.org/10.1145/3432261.3432266
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe, Masayuki Umemura, "OpenCL-enabled Parallel Raytracing for Astrophysical Application on Multiple FPGAs with Optical Links", 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC) in conjunction with SC20, On-line, Nov. 2020.
Ryuta Tsunashima, Ryohei Kobayashi, Norihisa Fujita, Taisuke Boku, Seyong Lee, Jeffrey S. Vetter, Hitoshi Murai, Masahiro Nakao, Mitsuhisa Sato, “OpenACC unified programming environment for GPU and FPGA multi-hybrid acceleration”, Proc. of Int. Conference on High Level Parallel Programming (HLPP) 2020, Porto (virtual), Jun. 2020.
Daisuke Tsuji, Taisuke Boku, Ryosaku Ikeda, Takuto Sato, Hiroto Tadano, Hiroyuki Kusaka, “Parallelized GPU Code of City-Level Large Eddy Simulation”, Proc. of Int. Symposium on Parallel and Distributed Computing (ISPDC) 2020, Warsaw (virtual), Jun. 2020.
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe, and Masayuki Umemura: “Accelerating Radiative Transfer Simulation with GPU-FPGA Cooperative Computation”, 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP), Manchester, United Kingdom, 2020, pp. 9-16.
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Tomohiro Ueno, Kentaro Sano, Taisuke Boku, “Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA”, Proc. of AsHES2020 (Int. Workshop on Accelerators and Hybrid Exascale Systems) in IPDPS 2020. On-line, May 2020.
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Ayumi Nakamichi, Taisuke Boku, "OpenCL-Enabled GPU–FPGA Accelerated Computing with Inter-FPGA Communication", Proc. of IXPUG Workshop HPC Asia 2020, Fukuoka, pp. 17-20, Jan. 2020. https://doi.org/10.1145/3373271.3373275
Yutaka Watanabe, Jinpil Lee, Kentaro Sano, Taisuke Boku, Mitsuhisa Sato, "Design and Preliminary Evaluation of OpenACC Compiler for FPGA with OpenCL and Stream Processing DSL", Proc. of IXPUG Workshop HPC Asia 2020, Fukuoka, pp. 17-20, Jan. 2020. https://doi.org/10.1145/3373271.3373274
Thomas Steinke, Estela Suarez, Taisuke Boku, Nalini Kumar, David E. Martin, "Using FPGAs to Accelerate HPC and Data Analytics on Intel-Based Systems", High Performance Computing: ISC High Performance 2019 International Workshops, Jun. 16-20, 2019, revised selected papers: Jun. 2019, pp. 561–566 https://doi.org/10.1007/978-3-030-34356-9_42
Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi and Taisuke Boku, "Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming", Proc. of AsHES2019 (Int. Workshop on Acceleraors and Hybrid Exascale Systems) in IPDPS 2019, Rio de Janeiro, May 2019.
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Ayumi Nakamichi and Taisuke Boku, "GPU-FPGA Heterogeneous Computing with OpenCL-enabled Direct Memory Access", Proc. of AsHES2019 (Int. Workshop on Acceleraors and Hybrid Exascale Systems) in IPDPS 2019, Rio de Janeiro, May 2019.
Miwako Tsuji, Taisuke Boku, Mitsuhisa Sato, “Scalable Communication Performance Prediction Using Auto-Generated Pseudo MPI Event Trace.”, Proc. of HPC Asia 2019, Guangzhou, Jan. 15th, 2019. https://doi.org/10.1145/3293320.3293323
Ryohei Kobayashi, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, “OpenCL-enabled high performance direct memory access for GPU-FPGA cooperative computation”, Proc. of IXPUG Workshop Asia 2019 (in HPC Asia 2019), Guangzhou, Jan. 14th, 2019. https://doi.org/10.1145/3317576.3317581
Yutaka Watanabe, Jinpil Lee, Taisuke Boku, Mitsuhisa Sato, "Trade-off of offloading to FPGA in OpenMP Task-based programming", Proc. of Int. Workshop on OpenMP 2018 (IWOMP2018), 12 pages, Barcelona, Sep. 2018.
Yuta Hirokawa, Taiuske Boku, Mitsuharu Uematsu, Shunsuke A. Sato, Kazuhiro Yabana, "Performance Optimization and Evaluation of Scalable Optoelectronics Application on Large Scale KNL Cluster", Proc. of ISC2018 (Int. Symposium on Supercomputing), 20 pages, Frankfurt, Jun. 26th 2018.
Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku, Yuma Oobata, Yoshiki Yamaguchi, Kohji Yoshikawa, Makino Abe, Masayuki Umemura, "Accelerating Space Radiative Transfer on FPGA using OpenCL", Proc. of HEART2018 (Int. Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies), Toronto, Jun. 21st 2018. https://doi.org/10.1145/3241793.3241799
Yuta Hirokawa, Taiuske Boku, Shunsuke A. Sato, Kazuhiro Yabana, "Performance Evaluation of Large Scale Electron Dynamics Simulation under Many-core Cluster based on Knights Landing", Proc. of HPC Asia 2018 (Int. Conference on High Performance Computing in Asia-Pacific Region), 9 pages, Tokyo, Jan. 30th 2018.
Ryohei Kobayashi, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, "OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing", Proc. of HPC Asia 2018 (Int. Conference on High Performance Computing in Asia-Pacific Region), 8 pages, Tokyo, Jan. 30th 2018.
Larry Meadows, Ken-Ichi Ishikawa, Taisuke Boku, Masashi Horikoshi, "Multiple endpoints for improved MPI performance on a lattice QCD code", Proc. of IXPUG Workshop 2018 (in Conjunction with HPC Asia 2018), pp. 67-70, Jan. 2018. https://doi.org/10.1145/3176364.3176375
Masashi Horikoshi, Larry Meadows, Tom Elken, Pradeep Sivakumar, Edward Mascarenhas, James Erwin, Dmitry Durnov, Alexander Sannikov, Toshihiro Hanawa, Taisuke Boku, "Scaling collectives on large clusters using Intel(R) architecture processors and fabric", Proc. of IXPUG Workshop 2018 (in Conjunction with HPC Asia 2018), pp. 59-62, Jan. 2018. https://doi.org/10.1145/3176364.3176373
Masahiro Nakao, Hitoshi Murai, Taisuke Boku, Mitsuhisa Sato, "Performance evaluation for omni XcalableMP compiler on many-core cluster system based on knights landing", Proc. of Workshop on PGAS programming models: Experiences and Implementations (in Conjunction with HPC Asia 2018), pp. 52-58, Jan. 2018. https://doi.org/10.1145/3176364.3176372
Masahiro Nakao, Hitoshi Murai, Taisuke Boku, Mitsuhisa Sato, "Linkage of XcalableMP and Python languages for high productivity on HPC cluster system: application to graph order/degree problem", Proc. of Workshop on PGAS programming models: Experiences and Implementations (in Conjunction with HPC Asia 2018), pp. 39-47, Jan. 2018. https://doi.org/10.1145/3176364.3176369
Akihiro Tabuchi, Masahiro Nakao, Hitoshi Murai, Taisuke Boku, Mitsuhisa Sato, "Performance evaluation for a hydrodynamics application in XcalableACC PGAS language for accelerated clusters", Proc. of Workshop on PGAS programming models: Experiences and Implementations (in Conjunction with HPC Asia 2018), pp. 1-10, Jan. 2018. https://doi.org/10.1145/3176364.3176365
Ryohei Kobayashi, Yuma Oobata, Norihisa Fujita, Yoshiki Yamaguchi, Taisuke Boku, "OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing", Proc. of HPC Asia 2018, pp. 192-201, Jan. 2018. https://doi.org/10.1145/3149457.3149479
Yuta Hirokawa, Taisuke Boku, Shunsuke A. Sato, Kazuhiro Yabana, "Performance Evaluation of Large Scale Electron Dynamics Simulation under Many-core Cluster based on Knights Landing", Proc. of HPC Asia 2018, pp. 183-191, Jan. 2018. https://doi.org/10.1145/3149457.3149465
Joachim Protze, Christian Terboven, Matthias S. Müller, Serge Petiton, Nahid Emad, Hitoshi Murai, Taisuke Boku, "Runtime Correctness Checking for Emerging Programming Paradigms", Proc. of Int. Workshop Correctness'17 (in Conjunction with SC17), pp. 21-27, Nov. 2017. https://doi.org/10.1145/3145344.3145490
Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, Taisuke Boku, Mitsuhisa Sato, "mplementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster", Proc. of IEEE Cluster2017, Hawaii, Sep. 2017.
Akihiro Tabuchi, Masahiro Nakao, Hitoshi Murai, Taisuke Boku, Mitsuhisa Sato, "Implementation and Evaluation of One-sided PGAS Communication in XcalableACC for Accelerated Clusters", Proc. of CCGrid2017, Madrid, May 15th 2017. https://doi.org/10.1109/CCGRID.2017.81
Kenta Sato, Norihisa Fujita, Toshihiro Hanawa, Taisuke Boku, Khaled Z. Ibrahim, "GPU-ready GASNet Implementation on the TCA Proprietary Interconnect Architecture", Proc. of CSCI2016 (Int. Conf. on Computational Science and Computational Intelligence 2016), 6 pages, Las Vegas, Dec. 2016.
Akihiro Tabuchi, Yasuyuki Kimura, Sunao Torii, Video Matsufuru, Tadashi Ishikawa, Taisuke Boku, Mitsuhisa Sato, "Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC", Proc. of IWOMP2016 (International Workshop on OpenMP (LNCS 9903: OpenMP: Memory, Devices, and Tasks), pp.293-305, Nara, Oct. 2016.
Kazuya Matsumoto, Norihisa Fujita, Toshihiro Hanawa, ,Taisuke Boku, “Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA”, Proc. of VECPAR2016, 8 pages, Porto, Jul. 2016.
Yuta Hirokawa, Taisuke Boku, Shunsuke Sato, Kazuhiro Yabana, "Electron Dynamics Simulation with Time-Dependent Density Functional Theory on Large Scale Symmetric Mode Xeon Phi Cluster", Proc. of PDSEC2016 (in IPDPS2016), 8 pages, Chicago, 2016.
Tetsuya Odajima, Taisuke Boku, Toshihiro Hanawa, Hitoshi Murai, Masahiro Nakao, Akihiro Tabuchi, Mitsuhisa Sato, "Hybrid Communication with TCA and InfiniBand on A Parallel Programming Language for
Accelerators XcalableACC", Proc. of HUCAA2015 (in Cluster2015), 8 pages, Chicago, Sept. 2015. https://doi.org/10.1109/CLUSTER.2015.112
Toshihiro Hanawa, Norihisa Fujita, Tetsuya Odajima, Kazuya Matsumoto, Taisuke Boku, "Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture", Proc. of HUCAA2015 (in Cluster2015), 8 pages, Chicago, Sept. 2015. https://doi.org/10.1109/CLUSTER.2015.113
Toshihiro Hanawa, Hisafumi Fujii, Norihisa Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama, Taisuke Boku, "Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture", Proc. of IEEE Cluster2015, Chicago, Sept. 2015. https://doi.org/10.1109/CLUSTER.2015.154
Kazuya Matsumoto, Toshihiro Hanawa, Yuetsu Kodama, Hisafumi Fujii, Taisuke Boku, "Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication", Proc. of AsHES2015 in IPDPS2015, Hyderabad, May 2015. https://doi.org/10.1109/IPDPSW.2015.102
Takuya Kuhara, Takahiro Kaneda, Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Hideharu Amano, "A Preliminarily Evaluation of PEACH3: A Switching Hub for Tightly Coupled Accelerators", Proc. of CANDAR '14, , pp. 377-381, Dec. 2014. https://doi.org/10.1109/CANDAR.2014.44
K. Tsugane, H. Nuga, T. Boku, H. Murai, M. Sato, W. Tang, B. Wang, "Hybrid-view Programming of Nuclear Fusion Simulation Code in the PGAS Parallel Programming Language XcalableMP", Proc. of ICPADS2014, Hsinchu, Dec. 2014.
Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, Akihiro Tabuchi, Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato. ``XcalableACC: Extension of XcalableMP PGAS Language using OpenACC for Accelerator Clusters,'' Workshop on accelerator programming using directives (WACCPD), New Orleans, LA, USA, Nov., 2014.
N. Fujita, H. Fujii, T. Hanawa, Y. Kodama, T. Boku, Y. Kuramashi, M. Clark, "QCD Library for GPU Cluster with Proprietary Interconnect for GPU Direct Communication", Proc. of HeteroPar 2014 (with EuroPar 2014), Porto, Aug. 2014. https://doi.org/10.1007/978-3-319-14325-5_22
Y. kodama, T. Hanawa, T. Boku, M. Sato, "PEACH2: FPGA based PCIe network device for Tightly Coupled Accelerators", Proc. of HEART2014, Sendai, Jun. 2014. (receiving Best Paper Award of HEART2014) https://doi.org/10.1145/2693714.2693716
N. Fujita, H. Nuga, T. Boku, Y. Idomura, "Nuclear Fusion Simulation Code Optimization and Performance Evaluation on GPU Clusters", Proc. of PDSEC2014 (with IPDPS2014), Phoenix, May 2014. https://doi.org/10.1109/IPDPSW.2014.142
T. Odajima, T. Boku, M. Sato, T. Hanawa, Y. Kodama, R. Namyst, S. Thibault, O. Aumage, "Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing", Proc. of Int. Workshop on Advances of Distributed and Parallel Processing 2013 (ADPC-2013, with ICA3PP-2013), Vietri sul Mare, LNCS-8286 Part II, pp.59-68, 2013. https://doi.org/10.1007/978-3-319-03889-6_7
Norihisa Fujita, Hideo Nuga, Taisuke Boku, Yasuhiro Idomura, "Nuclear Fusion Simulation Code Optimization on GPU Clusters", Proc. of ICPADS '13, pp. 420-421 Dec. 2013.
T. Hanawa, Y. Kodama, T. Boku, M. Sato, "Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators", Proc. of 3rd Int. Workshop on Accelerators and Hybrid Exascale Systems (AsHES 2013, with IPDPS2013), Boston, CD-ROM, 2013. https://doi.org/10.1109/IPDPSW.2013.226
T. Odajima, T. Boku, T. Hanawa, J. Lee, M. Sato, "GPU/CPU Work-Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing", Proc. of P2S2-2012 (with ICPP2012), Pittsburgh, CD-ROM, 2012. https://doi.org/10.1109/ICPPW.2012.16
T. Nomizu, D. Takahashi, J. Lee, T. Boku, M. Sato, "Implementation of XcalableMP Device Acceleration Extention with OpenCL", Proc. of PLC2012 (with IPDPS2012), Shanghai, CD-ROM, 2012. https://doi.org/10.1109/IPDPSW.2012.296
M. Nakao, J. Lee, T. Boku, M. Sato, "Productivity and Performance of Global-View Programming with XcalableMP PGAS Language", Proc. in CCGrid2012, Ottawa, CD-ROM. https://doi.org/10.1109/CCGrid.2012.118
S. Otani, H. Kondo, I. Nonomura, A. Ikeya, M. Uemura, Y. Hayakawa, T. Oshita, S. Kaneko, K. Asahina, K. Arimoto, S. Miura, T. Hanawa, T. Boku, M. Sato, "An 80Gb/s Dependable Communication SoC with PCI Express I/F and 8 CPUs", Proc. of ISSCC2011, San Francisco, CD-ROM, 2011.
J. Lee, M. T. Tran, T. Odajima, T. Boku, M. Sato, "An Extension of XcalableMP PGAS Language for Multi-node GPU Clusters," Ninth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar 2011), 2011.
T. Hanawa, T. Boku, S. Miura, M. Sato, K. Arimoto, "PEARL and PEACH: A Novel PCI Express Direct Link and Its Implementation," The Seventh Workshop on High-Performance, Power-Aware Computing (HPPAC 2011) in 25th International Parallel and Distributed Processing Symposium (IPDPS 2011), pp. 866-874, 2011.
S. Miura, T. Hanawa, T. Boku, M. Sato, "XMCAPI: Inter-core Communication Interface on Multi-chip Embedded Systems," Proc. of EUC 2011, pp.397-402.
T. Hanawa, T. Boku, S. Miura, M. Sato, K. Arimoto, ''PEARL: Power-aware, Dependable, and High-Performance Communication Link Using PCI Express", Proc. of IEEE/ACM International Conference on Green Computing and Communitations (GreenCom2010), pp. 284-291, Hangzhou, 2010.
T. Hanawa, T. Boku, S. Miura, M. Sato, and K. Arimoto, "Power-aware, Dependable, and High-Performance Communication Link Using PCI Express: PEARL," Proc. of IEEE International Conference on Cluster Computing (Cluster2010), poster, 4 pages, Creta Island, Sep. 2010.
M. Nakao, J. Lee, T. Boku, M. Sato, "XcalableMP Implementation and Performance of NAS Parallel Benchmarks", Proc. of PGAS10, New York, 2010.
T. Yonemoto, S. Miura, T. Hanawa, T. Boku, M. Sato, "Flexible Multi-link Ethernet Binding System for PC Clusters with Asymmetric Topology", Proc. of ICPADS2009, Memory-card, Shinzen, 2009.
T. Hanawa, M. Sato, J. Lee, T. Imada, H. Kimura, T. Boku, "Evaluation of Multicore Processor for Embedded Systems by Parallel Benchmark Program using OpenMP", Proc. of IWOMP2009, Dresden, 2009.
S. Miura, T. Hanawa, T. Yonemoto, T. Boku, M. Sato, "RI2N/DRV: Multi-link Ethernet for High-Bandwidth and Fault-Tolerant Network on PC Cluster", Proc. of CAC2009 (included in Proc. of IPDPS2009), CD-ROM, Rome, 2009.
J. Lee, M. Sato, T. Boku, "OpenMPD: A Directive-Based Data Parallel Language Extension for Distributed Memory Systems", Proc. of 1st Int. Workshop on Parallel Programming Models and System Software for High-End Computing (P2S2) (included in Proc. of ICPP08), Portland, 2008.
S. Miura, T. Boku, T. Okamoto, T. Hanawa, "A Dynamic Routing Control System for High-Performance PC Cluster with Multi-path Ethernet Connection", Proc. of CAC2008 (included in Proc. of IPDPS2008), CD-ROM, Miami, 2008.
J. Lee, M. Sato, T. Boku, "Design and Implementation of OpenMPD: An OpenMP-like Programming Language for Distributed Memory Systems", Proc. of Int. Workshop on OpenMP (IWOMP2007), Beijing, 2007.
T. Okamoto, S. Miura, T. Boku, M. Sato, D. Takahashi, "RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet", Proc. of CAC2007 (included in Proc. of IPDPS2007), CD-ROM, Long Beach, 2007.
T. Okamoto, T. Boku, M. Sato, T. Osamu, "P2P Overlay Network for TCP Programming with UDP Hole Punching", Proc. of NPC2006, Tokyo, 2006.
H. Kimura, M. Sato, Y. Hotta, T. Boku, D. Takahashi, "Empirical Study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS", Proc. of Cluster2006, Barcelona, 2006.
S. Sumimoto, K. Ooe, K. Kumon, T. Boku, M. Sato, A. Ukawa, "Scalable Communication Layer for Multi-Dimensional Crossbar Network Using Multiple Gigabit Ethernet", Proc. of ICS2006, Cairns, Australia, 2006.
T. Boku, M. Sato, A. Ukawa, D. Takahashi, S. Sumimoto, K. Kumon, T. Moriyama, M. Shimizu, "PACS-CS: A large-scale bandwidth-aware PC cluster for scientific computations", Proc. of CCGrid2006, Singapore, 2006.
T. Boku, M. Sato, D. Takahashi, H. Nakashima, H. Nakamura, S. Matsuoka, Y. Hotta, "MegaProto?/E: Power-Aware High-Performance Cluster with Commodity Technology", Proc. of HP-PAC06 (in IPDPS2006), Rhodes, Greece, 2006.
S. Miura, T. Okamoto, T. Boku, M. Sato, D. Takahashi, "Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology", Proc. of ISPAN2005, pp.84-91, Las Vegas, USA, 2005.
H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, Y. Hotta, "MegaProto?: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology", Proc. of SC05 (CD-ROM), Seattle, USA, 2005.
T. Boku, K. Onuma, M. Sato, Y. Nakajima, D. Takahashi, "Grid environment for computational astrophysics driven by GRAPE-6 with HMCS-G and OmniRPC", Proc. of Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models, IPDPS2005, Denver, USA, 2005.
H. Nakashima, H. Nakamura, M. Sato, T. Boku, S. Matsuoka, D. Takahashi, Y. Hotta, "MegaProto?: A Low-Level and Compact Cluster for High-Performance Computing", Proc. of Workshop on High Performance Power Aware Computing, IPDPS2005, Denver, USA, 2005.
Y. Ojima, M. Sato, T. Boku, D. Takahashi, "Design of Software Distributed Shared Memory System using MPI communication layer", Proc. of 4th International Workshop on OpenMP Experiences and Implementations (WOMPEI2005), Tsukuba, Japan, 2005.
T. Boku, M. Sato, M. Matsubara, D. Takahashi, "OpenMPI - OpenMP like tool for easy programming in MPI", Proc. of 6th European Workshop on OpenMP (EWOMP'04), Stockholm, Sweden, 2004.
C. Takahashi, M. Kondo, T. Boku, D. Takahashi, H. Nakamura, "SCIMA-SMP: On-chip memory processor architecture for SMP", Proc. 3rd Workshop on Memory Performance Issues (WMPI-2004), pp.121-128, Munich, Germany, 2004.
Y. Ohtaki, D. Takahashi, T. Boku, M. Sato, "Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters", Proceedings of Heterogeneous Computing Workshop 2004 in IPDPS2004, Santa Fe, USA, 2004.
Y. Hotta, M. Sato, T. Boku, D. Takahashi, C. Takahashi, "Measurement and Characterization of Power Consumption of Microprocessors for Power-aware Cluster", Proceedings of CoolChips? VII, Yokohama, Japan, 2004.
K. Onuma, T. Boku, M. Sato, D. Takahashi, H. Susa, M. Umemura, "Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC", Proceedings of International Workshop on High Performance Grid Computing and Networking, 2004 International Symposium on Applications and Internet, Tokyo, Jan. 2004.
Y. Nakajima, M. Sato, T. Boku, D. Takahashi, H. Gotoh, "Performance Evaluation of OmniRPC in a Grid Environment", Proceedings of International Workshop on High Performance Grid Computing and Networking, 2004 International Symposium on Applications and Internet, Tokyo, Jan. 2004.
S. Miura, T. Boku, M. Sato, D. Takahashi, "RI2N - Interconnection network system for clusters with wide-bandwidth and fault-tolerancy based on multiple links", Proceedings of International Symposium on High Performance Computing 2004 (ISHPC-V), LNCS-2858, pp.342-351, Tokyo, Oct. 2003.
T. Boku, M. Sato, K. Onuma, J. Makino, H. Susa, D. Takahashi, M. Umemura, A. Ukawa, "HMCS-G : grid enabled hybrid computing system for computational astrophysics", Proceedings of Grid and Advanced Network (GAN'03) in CCGrid2003, pp.558-565, Tokyo, May 2003.
M. Sato, T. Boku, D. Takahashi, "OmniRPC: a Grid RPC System for Parallel Programming in Cluster and Grid Environment", Proceedings of CCGrid2003, Tokyo, May 2003.
T. Boku, J. Makino, H. Susa, M. Umemura, T. Fukushige and A. Ukawa, "Heterogeneous Multi-Computer System: A New Paradim of Parallel Processing", Proceedings of 2002 International Conference on Parallel Procesing in Electrical Engineering, Warsaw, Sep. 2002 (invited talk).
T. Boku, J. Makino, H. Susa, M. Umemura, T. Fukushige and A. Ukawa, "Heterogeneous Multi-Computer System: A New Platform for Multi-Paradigm Scientific Simulation", Proceedings of 2002 International Conference on Supercomputing, pp.26-34, New York City, Jun. 2002.
D. Takahashi, M. Sato and T. Boku, "Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks", Proceedings of 4th International Symposium on High Performance Computing (ISHPC 2002), Lecture Notes in Computer Science, No. 2327, pp. 390-400, 2002.
T. Boku, S. Yoshikawa, M. Sato, C. G. Hoover and W. G. Hoover, "Implementation and performance evaluation of SPAM particle code with OpenMP-MPI hybrid programming", Proceedings of European Workshop on OpenMP (EWOMP) 2001, Barcelona, Sep. 2001.
T. Boku, M. Matsubara and K. Itakura, "PIO: Parallel I/O System for\\ Massively Parallel Processors", Proceedings of European High Performance Computing and Network Conference 2001 (LNCS-2110), pp.383-392, Amsterdam, Jun. 2001.
M. Kondo, H. Okawara, H. Nakamura, T. Boku and S. Sakai, "SCIMA: A Novel Processor Architecture for High Performance Computing", Proceedings of HPC Asia'2000, pp.355-360, Beijing, May 2000.
T. Boku, K. Itakura, S. Yoshikawa, M. Kondo and M. Sato, "Performance Analysis of PC-CLUMP based on SMP-Bus Utilization", Proceedings of WCBC'00 (Workshop on Cluster Based Computing 2000), Santa Fe, May 2000.
M. Matsubara, H. Numa, and T. Boku, "Commodity Network based Parallel I/O System for Massively Parallel Processors", Proceedings of PDPTA'99, pp.2424-2429, Las Vegas, Jun. 1999.
T. Boku, M. Mishima, K. Itakura, "VIPPES : A Virtual Parallel Processing System Simulation Environment", Proceedings of HPC Asia'98, pp.843-853, Singapore, Sep. 1998.
M. Matsubara, K. Itakura, T. Boku, "Large Scale Molecular Dynamics Simulations on CP-PACS", Proceedings of HPC Asia'98, pp.321-331, Singapore, Sep. 1998.
K. Kubota, M. Sato, K. Itakura, T. Boku, "Accuracy of fast performance prediction by instrumentation tool EXCIT", Proceedings of HPC Asia'98, pp.1031-1038, Singapore, Sep. 1998.
K. Kutoba, K. Itakura, M. Sato, T. Boku, "Practical Simulation of Large-Scale Parallel Programs and Its Performance Analysis of the NAS Parallel Bechmarks", Proceedings of Euro-Par'98 (LNCS 1470), pp.244-254, Manchester, 1998.
H. Nakamura, K. Itakura, M. Matsubara, T. Boku, K. Nakazawa, "Effectiveness of Register Preloading on CP-PACS Node Processor", Proceedings of Innovative Architecture for Future Generation High-Performance Processors and Systems, pp.83-90, Mauii, Oct. 1997.
T. Boku, K. Itakura, H. Nakamura, K. Nakazawa, "CP-PACS: A massively parallel processor for large scale scientific calculations", Proceedings of ACM International Conference on Supercomputing'97, pp.108-115, Vienna, Jul. 1997.
K. Itakura, T. Boku, H. Nakamura, K. Nakazawa, "Performance evaluation of CP-PACS on CG benchmark", Proceedings of HPC Asia'97, pp.678-683, Seoul, Apr. 1997.
Y. Abei, K. Itakura, T. Boku, H. Nakamura, K. Nakazawa, "Performance Improvement for Matrix Calculation on CP-PACS Node Processor", Proceedings of HPC Asia'97, pp.672-677, Seoul, Apr. 1997.
T. Boku, H. Nakamura, K. Nakazawa, Y. Iwasaki, "The Architecture of Massively Parallel Processor CP-PACS", Proceedings of 2nd pAs, pp.31-40, Aizu, Mar. 1997.
T. Morimoto, K. Saito, H. Nakamura, T. Boku, K. Nakazawa, "Advanced Processor Design Using Hardware Description Language AIDL", Proceedings of Asia and South Pacific Design Automation Conference 1997 (ASP-DAC'97), pp.387-390, Makuhari, Mar. 1997.
A. Murata, T. Boku, T. Harada, H. Amano, "The MDX (Multi-Dimensional X'bar): A class of networks for large scale multiprocessors", Proceedings of 9th International Conference on Parallel and Distributed Computign System (PDCS96), pp.296--303, 1996.
T. Boku, M. Mishima, K. Itakura, H. Nakamura, K. Nakazawa, "VIPPES: A performance pre-evaluation system for parallel processors", Presented at HPCN Europe'96, Brussel, Apr. 1996.
K. Itakura, M. Hattori, T. Boku, H. Nakamura, and K. Nakazawa, "Preliminary evaluation of NAS Parallel Benchmarks on CP-PACS", Proceedings of PERMEAN'95, pp.68-77, Beppu, Aug. 1995.
T. Boku, T. Harada, T. Sone, H. Nakamura, and K. Nakazawa, "INSPIRE : A general purpose network simulator generating system for massively parallel processors", Proceedings of PERMEAN'95, pp.24-33, Beppu, Aug. 1995.
H. Morimoto, K. Yamazaki, H. Nakamura, T. Boku, and K. Nakazawa, "Superscalar Processor Design with Hardware Description Language AIDL", Proceedings of 2nd Asia Pacific Conference on Hardware Description Language, pp.51-58, Nagoya, Oct. 1994.
H. Nakamura, T. Wakabayashi, K. Nakazawa, T. Boku, H. Wada, and Y. Inagami, "Pseudo Vector Processor for High-speed List Vector Computation with Hiding Memory Access Latency", Proceedings of IEEE TENCON'94, pp.338-342, Singapore, Sep. 1994.
H. Nakamura, K. Nakazawa, H. Li, H. Imori, T. Boku, I. Nakata, and Y. Yamashita, "Evaluation of Pseudo Vector Processor based on Slide-Windowed Registers", Proceedings of Hawaii International Conference on System Sciences 27, pp.368-377, Honolulu, Jan. 1994.
H. Nakamura, H. Imori, K. Nakazawa, T. Boku, I. Nakata, Y. Yamashita, H. Wada, and Y. Inagami, "A Scalar Architecture for Pseudo Vector Processing based on Slide-Windowed Registers", Proceedings of ACM International Conference on Supercomputing '93, pp.298-307, Tokyo, Jun. 1993.
T. Boku, A. Murata, and T. Kawai, "Why do experiments and theory disagree on the turbulence transition of the poiseuille flow ?", Proceedings of 4th International Symposium on Computational Fluid Dynamics, pp.121-126, Davis, Sep. 1991.
T. Kimura, T. Boku, T. Kudoh, and H. Amano, "A Concurrent Program Restructuring System for Scientific Calculations", Proceedings of 24th Hawaii International Conference on System Sciences (IEEE/ACM), pp.390-399, Honolulu, Jan. 1991.
T. Boku, S. Nomura, and H. Amano, "IMPULSE: A high performance processing unit for multiprocessors for scientific calculation", Proceedings of 15th International Symposium on Computer Architecture (IEEE/ACM), pp.365-372, Honolulu, Jun. 1988.
T. Boku, T. Kudoh, H. Amano, and H. Aiso, "DIPROS: A distributed processing system for NDL on (SM)^2-II", Proceedings of 20th Hawaii International Conference on System Sciences (IEEE/ACM), pp.208-217, Kona, Jan. 1987.
T. Kudoh, H. Amano, T. Boku, and H. Aiso, "NDL: A language for solving scientific problems on MIMD machines", Proceedings of 1st Super Computing Symposium (IEEE), pp.55-64, Miami, 1985.
H. Amano, T. Boku, T. Kudoh, and H. Aiso, "(SM)^2-II: The new version of the sparse matrix solving machine", Proceedings of 12th International Symposium on Computer Architecture (IEEE/ACM), pp.100-107, 1985.
Firmansyah Iman, Yoshiki Yamaguchi, Taisuke Boku, “Capability assessment of a multiple-FPGA system for high-performance computing”, HPC in Asia Poster, ISC2016, Frankfurt, Jun. 2016.
Kazuya Matsumoto, Norihisa Fujita, Toshihiro Hanawa, Taisuke Boku, "Implementation and Performance Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA", HPC in Asia Poster, ISC2016, Frankfurt, Jun. 2016.
T. Hanawa, Y. Kodama, T. Boku, M. Sato, "Tightly Coupled Accelerators Architecture for Low-Latency Inter-node Communication between Accelerators", SC14 Poster Session, New Orleans, 2014.
T. Hanawa, Y. Kodama, T. Boku, M. Sato, "Proprietary Interconnect with Low Latency for HA-PACS/TCA", HPC in Asia Session (poster) in Int. Supercomputing Conference (ISC) 2014, Leipzig, 2014. (received Best Poster Awards in HPC in Asia Poster Session)
N. Fujita, H. Nuga, T. Boku Y. Idomura, "Nuclear Fusion Simulation Code Optimization on GPU Clusters", ICPADS2013 (poster), Seoul, 2013.
T. Odajima, T. Boku, T. Hanawa, J. Lee, M. Sato, R. Namyst, S. Thibault, O. Aumage, “Task size control on high level programming for GPU/CPU work sharing”, HPC in Asia Session (poster), ISC2013.
T. Hanawa, Y. Kodama, T. Boku, M. Sato, “HA-PACS/TCA: Tightly Coupled Architectures for low-latency communication among GPUs”, HPC in Asia Session (poster), ISC2013.
H. Umeda, T. Hanawa, M. Shoji, T. Boku, "GPU accelerated Fock matrix preparation in OpenFMO", HPC in Asia Session (poster), ISC2013.
T. Hanawa, T. Boku, S. Miura, M. Sato, K. Arimoto, "PEACH: A Communication SoC for PCI Express Direct Link," IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips XIV), 1 page, 2011 (PDF), Best Feature Award