Publications

1. Journal Paper

Yukimasa Sugizaki and Daisuke Takahashi: Improved Modular Multiplication Algorithms Using Solely IEEE 754 Binary Floating-Point Operations, IEEE Transactions on Emerging Topics in Computing, Vol. 13, No. 3, pp. 1259-1271 (2025).
Takuya Edamatsu and Daisuke Takahashi: Fast Multiple-Precision Integer Division Using Intel AVX-512, IEEE Transactions on Emerging Topics in Computing, Vol. 11, No. 1, pp. 224-236 (2023).
Daisuke Takahashi: On the use of Montgomery multiplication in the computation of binary BBP-type formulas for mathematical constants, The Ramanujan Journal, Vol. 59, No. 1, pp. 211-219 (2022).
Yukimasa Sugizaki and Daisuke Takahashi: A Fast Algorithm for Computing the Number of Magic Series, Annals of Combinatorics, Vol. 26, No. 2, pp. 511-532 (2022).
Kazuhiko Komatsu, Ayumu Gomi, Ryusuke Egawa, Daisuke Takahashi, Reiji Suda, and Hiroyuki Takizawa: Xevolver: A code transformation framework for separation of system-awareness from application codes, Concurrency and Computation: Practice and Experience, Vol. 32, No. 7, e5577 (2020).
Daisuke Takahashi: On the computation and verification of π using BBP-type formulas, The Ramanujan Journal, Vol. 51, No. 1, pp. 177-186 (2020).
Takahiro Katagiri and Daisuke Takahashi: Japanese Autotuning Research: Autotuning Languages and FFT, Proceedings of the IEEE, Vol. 106, No. 11, pp. 2056-2067 (2018). (invited paper)
Daisuke Takahashi: Computation of the 100 quadrillionth hexadecimal digit of π on a cluster of Intel Xeon Phi processors, Parallel Computing, Vol. 75, pp. 1-10 (2018).
Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Hikaru Inoue, Yoshito Kitazawa, Ikuo Miyoshi, and Mitsuo Yokokawa: Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer, International Journal of High Performance Computing Applications, Vol. 28, No. 3, pp. 335-355 (2014).
Yutaka Maruyama, Norio Yoshida, Hiroto Tadano, Daisuke Takahashi, Mitsuhisa Sato, and Fumio Hirata: Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT, Journal of Computational Chemistry, Vol. 35, No. 18, pp. 1347-1355 (2014).
Yohei Miki, Daisuke Takahashi, and Masao Mori: Highly scalable implementation of an N-body code on a GPU cluster, Computer Physics Communications, Vol. 184, No. 9, pp. 2159-2168 (2013).
Daisuke Takahashi: Parallel implementation of multiple-precision arithmetic and 2,576,980,370,000 decimal digits of π calculation, Parallel Computing, Vol. 36, No. 8, pp. 439-448 (2010).
Yoshikuni Sato, Daisuke Takahashi, and Reijer Grimbergen: A Shogi Program Based on Monte-Carlo Tree Search, ICGA Journal, Vol. 33, No. 2, pp. 80-92 (2010).
Jun-Ichi Iwata, Daisuke Takahashi, Atsushi Oshiyama, Taisuke Boku, Kenji Shiraishi, Susumu Okada, and Kazuhiro Yabana: A massively-parallel electronic-structure calculations based on real-space density functional theory, Journal of Computational Physics, Vol. 229, No. 6, pp. 2339-2363 (2010).
Tetsuya Sakurai, Yoshihisa Kodaki, Hiroto Tadano, Daisuke Takahashi, Mitsuhisa Sato, and Umpei Nagashima: A parallel method for large sparse generalized eigenvalue problems using a GridRPC system, Future Generation Computer Systems, Vol. 24, No. 6, pp. 613-619 (2008).
Taisuke Boku, Hajime Susa, Kenji Onuma, Masayuki Umemura, Mitsuhisa Sato, and Daisuke Takahashi: Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multicomputer System, International Journal for Multiscale Computational Engineering, Vol. 4, No. 2, pp. 281-289 (2006).
Daisuke Takahashi: An algorithm for multiple-precision floating-point multiplication, Applied Mathematics and Computation, Vol. 166, No. 2, pp. 291-298 (2005).
Daisuke Takahashi: A parallel 1-D FFT algorithm for the Hitachi SR8000, Parallel Computing, Vol. 29, No. 6, pp. 679-690 (2003).
Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku: Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks, International Journal of Parallel Programming, Vol. 31, No. 3, pp. 185-196 (2003).
Daisuke Takahashi: Efficient implementation of parallel three-dimensional FFT on clusters of PCs, Computer Physics Communications, Vol. 152, No. 2, pp. 144-150 (2003).
Daisuke Takahashi: An Extended Split-Radix FFT Algorithm, IEEE Signal Processing Letters, Vol. 8, No. 5, pp. 145-147 (2001).
Daisuke Takahashi: A fast algorithm for computing large Fibonacci numbers, Information Processing Letters, Vol. 75, No. 6, pp. 243-246 (2000).
Daisuke Takahashi and Yasumasa Kanada: High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers, The Journal of Supercomputing, Vol. 15, No. 2, pp. 207-228 (2000).

2. Conference Proceedings (with review)

Yukimasa Sugizaki and Daisuke Takahashi: Improved Implementation of Number Theoretic Transform on NVIDIA GPU with Tensor Cores, Proc. Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region (SCA/HPCAsia 2026), pp. 142-152 (2026).
Yukimasa Sugizaki and Daisuke Takahashi: An Improved Implementation of Multi-Threaded Number Theoretic Transform Using Arm Scalable Vector Extension Instruction Set, Proc. 24th International Symposium on Parallel and Distributed Computing (ISPDC 2025), pp. 68-75 (2025).
Tomoya Nagahashi and Daisuke Takahashi: Construction of Large Zero-Aware Pattern Databases for Sliding Puzzles on Distributed Memory Machines, Proc. 25th International Conference on Computational Science and Its Applications (ICCSA 2025), Part I, Lecture Notes in Computer Science, Vol. 15648, pp. 272-284, Springer (2025).
Daisuke Takahashi: Implementation of Multiple Multiplicative Inverses Modulo 2^w Using Intel AVX-512 Instructions, Proc. 25th International Conference on Computational Science and Its Applications (ICCSA 2025), Part III, Lecture Notes in Computer Science, Vol. 15650, pp. 375-384, Springer (2025). (short paper)
Daisuke Takahashi: Parallel Implementation of Number-Theoretic Transform on GPU Clusters, Proc. 24th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2024), Part III, Lecture Notes in Computer Science, Vol. 15253, pp. 204-218, Springer (2025).
Daisuke Takahashi: On the Division in the Computation of Binary BBP-Type Formulas for Mathematical Constants, Proc. 4th International Conference on Numerical Computations: Theory and Algorithms (NUMTA 2023), Part II, Lecture Notes in Computer Science, Vol. 14477, pp. 323-330, Springer (2025). (short paper)
Shota Kawakami and Daisuke Takahashi: Implementation and Evaluation of Octuple-Precision Fast Fourier Transform on GPU, Proc. 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2024), pp. 287-294 (2024).
Toshihiro Hanawa, Kengo Nakajima, Yohei Miki, Takashi Shimokawabe, Kazuya Yamazaki, Shinji Sumimoto, Osamu Tatebe, Taisuke Boku, Daisuke Takahashi, Akira Nukada, Norihisa Fujita, Ryohei Kobayashi, Hiroto Tadano, and Akira Naruse: Preliminary Performance Evaluation of Grace-Hopper GH200, Proc. 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops 2024), pp. 184-185 (2024). (poster paper)
Daisuke Takahashi: Multiple Integer Divisions with an Invariant Dividend and Monotonically Increasing or Decreasing Divisors, Proc. 23rd International Conference on Computational Science and Its Applications (ICCSA 2023), Part II, Lecture Notes in Computer Science, Vol. 13957, pp. 393-401, Springer (2023). (short paper)
Takuya Edamatsu and Daisuke Takahashi: Efficient Large Integer Multiplication with Arm SVE Instructions, Proc. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2023), pp. 9-17 (2023).
Daisuke Takahashi: An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions, Proc. 24th International Workshop on Computer Algebra in Scientific Computing (CASC 2022), Lecture Notes in Computer Science, Vol. 13366, pp. 318-332, Springer (2022).
Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow, Proc. 21st International Conference on Computational Science and Its Applications (ICCSA 2021), Part I, Lecture Notes in Computer Science, Vol. 12949, pp. 95-110, Springer (2021).
Naruya Kitai, Daisuke Takahashi, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima, and Toru Nagai: An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL, Proc. 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2021), The 16th International Workshop on Automatic Performance Tuning (iWAPT 2021), pp. 789-797 (2021).
Daisuke Takahashi: Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions, Proc. 20th International Conference on Computational Science and Its Applications (ICCSA 2020), Part V, Lecture Notes in Computer Science, Vol. 12253, pp. 655-663, Springer (2020). (short paper)
Yukimasa Sugizaki and Daisuke Takahashi: Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm, Proc. 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020), Part II, Lecture Notes in Computer Science, Vol. 12453, pp. 365-382, Springer (2020).
Daisuke Takahashi: Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Intel Xeon Phi Clusters, Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019), Part I, Lecture Notes in Computer Science, Vol. 12043, pp. 151-161, Springer (2020).
Takuya Edamatsu and Daisuke Takahashi: Accelerating Large Integer Multiplication Using Intel AVX-512IFMA, Proc. 19th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2019), Part I, Lecture Notes in Computer Science, Vol. 11944, pp. 60-74, Springer (2020).
Daisuke Takahashi and Franz Franchetti: FFTE on SVE: SPIRAL-Generated Kernels, Proc. International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2020), pp. 114-122 (2020).
Samar Aseeri, Benson K. Muite, and Daisuke Takahashi: Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications, Companion of the 2019 ACM/SPEC International Conference on Performance Engineering (ICPE'19), pp. 5-8 (2019). (vision paper)
Takuya Edamatsu and Daisuke Takahashi: Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions, Proc. 20th IEEE International Conference on High Performance Computing and Communications (HPCC-2018), pp. 211-218 (2018).
Daisuke Takahashi: An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors, Proc. 17th International Conference on Computational Science and Its Applications (ICCSA 2017), Part I, Lecture Notes in Computer Science, Vol. 10404, pp. 401-410, Springer (2017).
Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, and Daisuke Takahashi: A Customizable Auto-Tuning Scenario with User-defined Code Transformations, Proc. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2017), The 12th International Workshop on Automatic Performance Tuning (iWAPT 2017), pp. 1372-1378 (2017).
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs, Proc. 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), Special Session: Auto-Tuning for Multicore and GPU (ATMG), pp. 377-384 (2016).
Daisuke Takahashi: Automatic Tuning of Computation-Communication Overlap for Parallel 1-D FFT, Proc. 2016 IEEE 19th International Conference on Computational Science and Engineering (CSE 2016), pp. 253-256 (2016). (short paper)
Daisuke Takahashi: Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors, Proc. 16th International Conference on Computational Science and Its Applications (ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787, pp. 60-70, Springer (2016).
Hiroshi Maeda and Daisuke Takahashi: Parallel Sparse Matrix-Vector Multiplication Using Accelerators, Proc. 16th International Conference on Computational Science and Its Applications (ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787, pp. 3-18, Springer (2016).
Hiroshi Maeda and Daisuke Takahashi: Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster, Proc. 2015 Third International Symposium on Computing and Networking (CANDAR'15), 3rd International Workshop on Computer Systems and Architectures (CSA'15), pp. 396-399 (2015). (poster paper)
Daisuke Takahashi: An Implementation of Parallel 1-D FFT Using AVX Instructions on Multi-Core Processors, Proc. 2012 International Workshop on Innovative Architecture for Future Generation Processors and Systems (IWIA 2012), pp. 83-88 (2015).
Daisuke Takahashi: Optimization of All-to-All Communication on Multi-Core Cluster Systems, Proc. 2011 International Workshop on Innovative Architecture for Future Generation Processors and Systems (IWIA 2011), pp. 3-7 (2015).
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs, Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015), pp. 642-650 (2015).
Daichi Mukunoki and Daisuke Takahashi: Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs, Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384, pp. 632-642, Springer (2014).
Takaaki Hiragushi and Daisuke Takahashi: Efficient Hybrid Breadth-First Search on GPUs, Proc. 13th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2013), Part II, 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013), Lecture Notes in Computer Science, Vol. 8286, pp. 40-50, Springer (2013).
Daisuke Takahashi: Implementation of Parallel 1-D FFT on GPU Clusters, Proc. 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE 2013), pp. 174-180 (2013).
Yoshikuni Sato, Makoto Miwa, Shogo Takeuchi, and Daisuke Takahashi: Optimizing Objective Function Parameters for Strength in Computer Game-Playing, Proc. 27th AAAI Conference on Artificial Intelligence (AAAI-13), pp. 869-875 (2013).
Daichi Mukunoki and Daisuke Takahashi: Optimization of Sparse Matrix-vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs, Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science, Vol. 7975, pp. 211-223, Springer (2013).
Hiroki Yoshizawa and Daisuke Takahashi: Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs, Proc. 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE 2012), pp. 130-136 (2012).
Daisuke Takahashi: An Implementation of Parallel 2-D FFT Using Intel AVX Instructions on Multi-Core Processors, Proc. 12th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2012), Part II, Lecture Notes in Computer Science, Vol. 7440, pp. 197-205, Springer (2012). (short paper)
Daisuke Takahashi, Atsuya Uno, and Mitsuo Yokokawa: An Implementation of Parallel 1-D FFT on the K computer, Proc. 2012 IEEE 14th International Conference on High Performance Computing and Communications (HPCC-2012), pp. 344-350 (2012).
T. Boku, K.-I. Ishikawa, Y. Kuramashi, K. Minami, Y. Nakamura, F. Shoji, D. Takahashi, M. Terai, A. Ukawa, and T. Yoshie: Multi-block/multi-core SSOR preconditioner for the QCD quark solver for K computer, Proceedings of Science, The 30th International Symposium on Lattice Field Theory (Lattice 2012), p. 188 (2012).
Yohei Miki, Daisuke Takahashi, and Masao Mori: A Fast Implementation and Performance Analysis of Collisionless N-body Code Based on GPGPU, Proc. International Conference on Computational Science (ICCS 2012), Procedia Computer Science, Vol. 9, pp. 96-105, Elsevier (2012).
Takuma Nomizu, Daisuke Takahashi, Jinpil Lee, Taisuke Boku, and Mitsuhisa Sato: Implementation of XcalableMP Device Acceleration Extention with OpenCL, Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), Multicore and GPU Programming Models, Languages and Compilers Workshop (PLC 2012), pp. 2394-2403 (2012).
Daichi Mukunoki and Daisuke Takahashi: Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs, Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12), pp. 1378-1386 (2012).
Daichi Mukunoki and Daisuke Takahashi: Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs, Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science, Vol. 7133, pp. 249-259, Springer (2012).
Takatoshi Nakayama and Daisuke Takahashi: Implementation of Multiple-Precision Floating-Point Arithmetic Library for GPU Computing, Proc. 23rd IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2011), pp. 343-349 (2011).
Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Fumiyoshi Shoji, Atsuya Uno, Motoyoshi Kurokawa, Hikaru Inoue, Ikuo Miyoshi, and Mitsuo Yokokawa: First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer, Proc. 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11) (2011).
Yuji Kubota and Daisuke Takahashi: Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU, Proc. 11th International Conference on Computational Science and Its Applications (ICCSA 2011), Part II, Lecture Notes in Computer Science, Vol. 6783, pp. 547-561, Springer (2011).
Daisuke Takahashi: An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors, Proc. 8th International Conference on Parallel Processing and Applied Mathematics (PPAM 2009), Part I, Workshop on Memory Issues on Multi- and Manycore Platforms, Lecture Notes in Computer Science, Vol. 6067, pp. 606-614, Springer (2010).
Chikafumi Takahashi, Mitsuhisa Sato, Daisuke Takahashi, Taisuke Boku, Akira Ukawa, Hiroshi Nakamura, Hidetaka Aoki, Hideo Sawamoto, and Naonobu Sukegawa: Design and Power Performance Evaluation of On-Chip Memory Processor with Arithmetic Accelerators, Proc. 2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA 2008), pp. 51-57 (2009).
Daisuke Takahashi: A Parallel Algorithm for Multiple-Precision Division by a Single-Precision Integer, Proc. 6th International Conference on Large-Scale Scientific Computations (LSSC 2007), Lecture Notes in Computer Science, Vol. 4818, pp. 729-736, Springer (2008).
Chikafumi Takahashi, Mitsuhisa Sato, Daisuke Takahashi, Taisuke Boku, Hiroshi Nakamura, Masaaki Kondo, and Motonobu Fujita: Empirical Study for Optimization of Power-Performance with On-Chip Memory, Proc. First International Workshop on Advanced Low Power Systems (ALPS 2006), Lecture Notes in Computer Science, Vol. 4759, pp. 466-479, Springer (2008).
Daisuke Takahashi: Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-Core Processors, Proc. 2007 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA 2007), pp. 53-59 (2008).
Daisuke Takahashi: An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors, Proc. 8th International Workshop on State of the Art in Scientific Computing (PARA 2006), Lecture Notes in Computer Science, Vol. 4699, pp. 1178-1187, Springer (2007).
Akira Nukada, Daisuke Takahashi, Reiji Suda, and Akira Nishida: High Performance FFT on SGI Altix 3700, Proc. 3rd International Conference on High Performance Computing and Communications (HPCC 2007), Lecture Notes in Computer Science, Vol. 4782, pp. 396-407, Springer (2007).
Takayuki Imada, Mitsuhisa Sato, Yoshihiko Hotta, Hideaki Kimura, Taisuke Boku, Daisuke Takahashi, Shinichi Miura, and Hiroshi Nakashima: Power-performance Evaluation on Ultra-Low Power High-performance Cluster System: MegaProto/E, Proc. IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips X), pp. 117-129 (2007).
Takayuki Okamoto, Shinichi Miura, Taisuke Boku, Mitsuhisa Sato, and Daisuke Takahashi: RI2N/UDP: High bandwidth and fault-tolerant network for PC-cluster based on multi-link Ethernet, Proc. 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), The Workshop on Communication Architecture for Clusters (CAC 2007) (2007).
Hideaki Kimura, Mitsuhisa Sato, Yoshihiko Hotta, Taisuke Boku, and Daisuke Takahashi: Empirical Study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS, Proc. 2006 IEEE International Conference on Cluster Computing (Cluster 2006), pp. 1-10 (2006).
Taisuke Boku, Mitsuhisa Sato, Akira Ukawa, Daisuke Takahashi, Shinji Sumimoto, Kouichi Kumon, Takashi Moriyama, and Masaaki Shimizu: PACS-CS: A large-scale bandwidth-aware PC cluster for scientific computations, Proc. Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), pp. 233-240 (2006).
Daisuke Takahashi: A Hybrid MPI/OpenMP Implementation of a Parallel 3-D FFT on SMP Clusters, Proc. 6th International Conference on Parallel Processing and Applied Mathematics (PPAM 2005), Lecture Notes in Computer Science, Vol. 3911, pp. 970-977, Springer (2006).
Yoshiaki Aida, Yoshihiro Nakajima, Mitsuhisa Sato, Tetsuya Sakurai, Daisuke Takahashi, and Taisuke Boku: Performance Improvement by Data Management Layer in a Grid RPC System, Proc. First International Conference on Grid and Pervasive Computing (GPC 2006), Lecture Notes in Computer Science, Vol. 3947, pp. 324-335, Springer (2006).
Taisuke Boku, Mitsuhisa Sato, Daisuke Takahashi, Hiroshi Nakashima, Hiroshi Nakamura, Satoshi Matsuoka, and Yoshihiko Hotta: MegaProto/E: Power-Aware High-Performance Cluster with Commodity Technology, Proc. 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), The Second Workshop on High-Performance, Power-Aware Computing (HP-PAC 2006) (2006).
Yoshihiko Hotta, Mitsuhisa Sato, Hideaki Kimura, Satoshi Matsuoka, Taisuke Boku, and Daisuke Takahashi: Profile-based Optimization of Power Performance by using Dynamic Voltage Scaling on a PC cluster, Proc. 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), The Second Workshop on High-Performance, Power-Aware Computing (HP-PAC 2006) (2006).
Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs, Proc. 7th International Workshop on Applied Parallel Computing (PARA 2004), Lecture Notes in Computer Science, Vol. 3732, pp. 1159-1167, Springer (2006).
Tetsuya Sakurai, Kentaro Hayakawa, Mitsuhisa Sato, and Daisuke Takahashi: A Parallel Method for Large Sparse Generalized Eigenvalue Problems by OmniRPC in a Grid Environment, Proc. 7th International Workshop on Applied Parallel Computing (PARA 2004), Lecture Notes in Computer Science, Vol. 3732, pp. 1151-1158, Springer (2006).
Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku: Computation of High-Precision Mathematical Constants in a Combined Cluster and Grid Environment, Proc. 5th International Conference on Large-Scale Scientific Computations (LSSC 2005), Lecture Notes in Computer Science, Vol. 3743, pp. 454-461, Springer (2006).
Yoshinori Ojima, Mitsuhisa Sato, Taisuke Boku, and Daisuke Takahashi: Design of a Software Distributed Shared Memory System using an MPI communication layer, Proc. 8th International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN 2005), pp. 220-229 (2005).
Shinichi Miura, Takayuki Okamoto, Taisuke Boku, Mitsuhisa Sato, and Daisuke Takahashi: Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology, Proc. 8th International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN 2005), pp. 84-93 (2005).
Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, and Yoshihiko Hotta: MegaProto: 1TFlops/10kW Rack Is Feasible Even with Only Commodity Technology, Proc. 2005 ACM/IEEE Conference on Supercomputing (SC|05) (2005).
Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke Boku, Satoshi Matsuoka, Daisuke Takahashi, and Yoshihiko Hotta: MegaProto: A Low-Power and Compact Cluster for High-Performance Computing, Proc. 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), Workshop on High Performance, Power-Aware Computing (HPPAC) (2005).
Taisuke Boku, Kenji Onuma, Mitsuhisa Sato, Yoshihiro Nakajima, and Daisuke Takahashi: Grid environment for computational astrophysics driven by GRAPE-6 with HMCS-G and OmniRPC, Proc. 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), Joint Workshop on High-Performance Grid Computing & High-Level Parallel Programming Models (HIPS-HPGC) (2005).
Yoshinori Ojima, Mitsuhisa Sato, Taisuke Boku, and Daisuke Takahashi: Design of Software Distributed Shared Memory System using MPI communication layer, Proc. 4th International Workshop on OpenMP: Experiences and Implementations (WOMPEI 2005), pp. 18-25 (2005).
Taisuke Boku, Mitsuhisa Sato, Masazumi Matsubara, and Daisuke Takahashi: OpenMPI — OpenMP like tool for easy programming in MPI, Proc. 6th European Workshop on OpenMP (EWOMP 2004), pp. 83-88 (2004).
Yoshihiro Nakajima, Mitsuhisa Sato, Hitoshi Goto, Taisuke Boku, and Daisuke Takahashi: Implementation and Performance Evaluation of CONFLEX-G: Grid-enabled Molecular Conformational Space Search Program with OmniRPC, Proc. 18th International Conference on Supercomputing (ICS'04), pp. 154-163 (2004).
Chikafumi Takahashi, Masaaki Kondo, Taisuke Boku, Daisuke Takahashi, Hiroshi Nakamura, and Mitsuhisa Sato: SCIMA-SMP: on-chip memory processor architecture for SMP, Proc. 3rd Workshop on Memory Performance Issues (WMPI'04), pp. 121-128 (2004).
Taisuke Boku, Hajime Susa, Kenji Onuma, Masayuki Umemura, Mitsuhisa Sato, and Daisuke Takahashi: Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multi-Computer System, Proc. International Conference on Computational Science 2004 (ICCS 2004), Part IV, Workshop on Modeling and Simulation of Multi-physics Multi-scale Systems, Lecture Notes in Computer Science, Vol. 3039, pp. 629-636, Springer (2004).
Yuhsuke Ohtaki, Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters, Proc. 18th International Parallel and Distributed Processing Symposium (IPDPS'04), The 13th Heterogeneous Computing Workshop (HCW 2004) (2004).
Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Daisuke Takahashi, and Chikafumi Takahashi: Measurement and Characterization of Power Consumption of Microprocessors for Power-aware Cluster, Proc. An International Symposium on Low-Power and High-Speed Chips (COOL Chips VII), pp. 293-303 (2004).
Yoshihiro Nakajima, Mitsuhisa Sato, Taisuke Boku, Daisuke Takahashi, and Hitoshi Gotoh: Performance Evaluation of OmniRPC in a Grid Environment, Proc. 2004 International Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), pp. 658-664 (2004).
Kenji Onuma, Taisuke Boku, Mitsuhisa Sato, Daisuke Takahashi, Hajime Susa, and Masayuki Umemura: Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC, Proc. 2004 International Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), pp. 623-629 (2004).
Shinichi Miura, Taisuke Boku, Mitsuhisa Sato, and Daisuke Takahashi: RI2N — Interconnection Network System for Clusters with Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links, Proc. 5th International Symposium on High Performance Computing (ISHPC 2003), Lecture Notes in Computer Science, Vol. 2858, pp. 342-351, Springer (2003).
Daisuke Takahashi: A Radix-16 FFT Algorithm Suitable for Multiply-Add Instruction Based on Goedecker Method, Proc. 2003 IEEE International Conference on Multimedia and Expo (ICME 2003), Vol. 2, pp. 845-848 (2003). (poster paper)
Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku: An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors, Proc. International Workshop on OpenMP Applications and Tools (WOMPAT 2003), Lecture Notes in Computer Science, Vol. 2716, pp. 99-108, Springer (2003).
Taisuke Boku, Mitsuhisa Sato, Kenji Onuma, Junichiro Makino, Hajime Susa, Daisuke Takahashi, Masayuki Umemura, and Akira Ukawa: HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics, Proc. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'03), Workshop on Grids and Advanced Networks (GAN'03), pp. 558-565 (2003).
Mitsuhisa Sato, Taisuke Boku, and Daisuke Takahashi: OmniRPC: a Grid RPC System for Parallel Programming in Cluster and Grid Environment, Proc. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'03), pp. 206-213 (2003).
Daisuke Takahashi: A Radix-16 FFT Algorithm Suitable for Multiply-Add Instruction Based on Goedecker Method, Proc. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Vol. 2, pp. 665-668 (2003). (poster paper)
Shinsuke Nara, Yuichi Goto, Daisuke Takahashi, and Jingde Cheng: Parallel Forward Deduction System for General-Purpose Entailment Calculus on Clusters of PCs, Proc. IASTED International Conference on Networks, Parallel and Distributed Processing, and Applications (NPDPA 2002), pp. 359-364 (2002).
Yuichi Goto, Daisuke Takahashi, and Jingde Cheng: Improving Performance of Automated Forward Deduction System EnCal on Shared-Memory Parallel Computers, Proc. Third International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2002), pp. 63-68 (2002).
Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs, Proc. 8th International Euro-Par Conference (Euro-Par 2002), Lecture Notes in Computer Science, Vol. 2400, pp. 691-700, Springer (2002).
Daisuke Takahashi: A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers, Proc. 6th International Conference on Applied Parallel Computing (PARA 2002), Lecture Notes in Computer Science, Vol. 2367, pp. 380-389, Springer (2002).
Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku: Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks, Proc. 4th International Symposium on High Performance Computing (ISHPC 2002), Lecture Notes in Computer Science, Vol. 2327, pp. 390-400, Springer (2002).
Yuichi Goto, Daisuke Takahashi, and Jingde Cheng: Parallel Forward Deduction Algorithms of General-Purpose Entailment Calculus on Shared-Memory Parallel Computers, Proc. 2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing (SNPD'01), pp. 168-175 (2001).
Daisuke Takahashi: A Blocking Algorithm for FFT on Cache-Based Processors, Proc. 9th International Conference on High Performance Computing and Networking Europe (HPCN Europe 2001), Lecture Notes in Computer Science, Vol. 2110, pp. 551-554, Springer (2001). (poster paper)
Daisuke Takahashi: A Mixed-Radix Parallel Three-Dimensional FFT Algorithm on Clusters of Vector SMPs, Proc. Tenth SIAM Conference on Parallel Processing for Scientific Computing (PP01) (2001).
Seiji Nishimura, Daisuke Takahashi, Takaomi Shigehara, Hiroshi Mizoguchi, and Taketoshi Mishima: A Performance Study on a Single Processing Node of the HITACHI SR8000, Proc. Second International Conference on Numerical Analysis and Its Applications (NAA 2000), Lecture Notes in Computer Science, Vol. 1988, pp. 628-635, Springer (2001).
Daisuke Takahashi: A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs, Proc. 5th International Workshop on Applied Parallel Computing (PARA 2000), Lecture Notes in Computer Science, Vol. 1947, pp. 316-323, Springer (2001).
Daisuke Takahashi: Implementation of Multiple-Precision Parallel Division and Square Root on Distributed-Memory Parallel Computers, Proc. 2000 International Workshop on Parallel Processing (ICPP'00 Workshops), Workshop on High Performance Scientific and Engineering Computing with Applications (HPSECA-00), pp. 229-235 (2000).
Seiji Nishimura, Daisuke Takahashi, Takaomi Shigehara, Hiroshi Mizoguchi, and Taketoshi Mishima: Efficient Implementation of CG & CR Methods for Linear Systems on a Single Processing Node of HITACHI SR8000, Proc. 2000 International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2000), pp. 298-301 (2000).
Daisuke Takahashi: A New Radix-6 FFT Algorithm Suitable for Multiply-Add Instruction, Proc. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), Vol. 6, pp. 3343-3346 (2000). (poster paper)
Daisuke Takahashi: High-Performance Parallel FFT Algorithms for the HITACHI SR8000, Proc. Fourth International Conference/Exhibition on High Performance Computing in Asia-Pacific Region (HPC-Asia 2000), Vol. 1, pp. 192-199 (2000).
Daisuke Takahashi and Yasumasa Kanada: Fast High-Precision Arithmetic on Distributed Memory Parallel Machines, Proc. Ninth SIAM Conference on Parallel Processing for Scientific Computing (PP99) (1999).

3. Conference Proceedings (without review)

Satoshi Matsuoka, William Kramer, and Daisuke Takahashi: The HPC Decathlon Assessment Measure: A Proposal to Define a New Composite Benchmark for High Performance Computing, Storage, Networking and Analysis, Proc. Workshop on Modeling & Simulation of Exascale Systems and Applications (MODSIM 2013) (2013). (position paper)
Hiroyuki Takizawa, Ryusuke Egawa, Daisuke Takahashi, and Reiji Suda: HPC Refactoring with Hierarchical Abstractions to Help Software Evolution, Sustained Simulation Performance 2012: Proceedings of the joint Workshop on High Performance Computing on Vector Systems, Stuttgart (HLRS), and Workshop on Sustained Simulation Performance, Tohoku University, 2012, pp. 27-33, Springer (2013).
Daichi Mukunoki and Daisuke Takahashi: Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs, Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12), pp. 788-790 (2012).
Piotr Luszczek, David H. Bailey, Jack Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi: The HPC Challenge (HPCC) benchmark suite, Proc. 2006 ACM/IEEE Conference on Supercomputing (SC'06) (2006).
Takuya Yokozawa, Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: Efficient Parallel Implementation of Classical Gram-Schmidt Orthogonalization Using Matrix Multiplication, Proc. 4th International Workshop on Parallel Matrix Algorithms and Applications (PMAA'06), pp. 37-38 (2006).
Hideaki Kimura, Mitsuhisa Sato, Yoshihiko Hotta, Taisuke Boku, Daisuke Takahashi: Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster, Proc. IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips IX), p. 187 (2006).
Mitsuhisa Sato, Yoshihiro Nakajima, Tetsuya Sakurai, Taisuke Boku, and Daisuke Takahashi: OmniRPC Grid Parallel Programming Environment for a Large Scale Numerical Computation, Proc. 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation (2005).
Mitsuhisa Sato, Yoshinori Ojima, Taisuke Boku, and Daisuke Takahashi: Portable Software Distributed Shared Memory SCASH-MPI for Omni OpenMP Compiler, Proc. First International Workshop on OpenMP (IWOMP 2005) (2005).
Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Hiroshi Nakashima, Hiroshi Nakamura, Satoshi Matsuoka, Daisuke Takahashi, Chikafumi Takahashi, Shinichi Miura, Yoshihiro Nakajima, Masaaki Kondo, and Motonobu Fujita: MegaProto: A Prototype of Ultra Low-Power Mega-Scale System, Proc. An International Symposium on Low-Power and High-Speed Chips (COOL Chips VII), Vol. 1, p. 84 (2004).
Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Daisuke Takahashi, and Chikafumi Takahashi: Measurement and Characterization of Power Consumption of Microprocessors for Power-aware Computing, Proc. An International Symposium on Low-Power and High-Speed Chips (COOL Chips VI), Vol. 1, p. 77 (2003).

4. Oral Presentation

Daisuke Takahashi: Automatic Tuning for Parallel Number-Theoretic Transforms on GPU Clusters, SIAM Conference on Parallel Processing for Scientific Computing (PP26), Zuse Institute Berlin and Free University of Berlin, Berlin, Germany, March 6, 2026.
Daisuke Takahashi: Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Manycore Clusters, The 14th AIMS Conference, ADNEC Centre Abu Dhabi, Abu Dhabi, UAE, December 20, 2024.
Daisuke Takahashi: Implementation of Parallel Number-Theoretic Transform on GPU Clusters, SIAM Conference on Parallel Processing for Scientific Computing (PP24), Lord Baltimore Hotel, Baltimore, Maryland, USA, March 7, 2024.
Daisuke Takahashi: Multiple Integer Divisions with an Invariant Dividend, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), Waseda University, Shinjuku-ku, Tokyo, Japan, August 21, 2023.
Daisuke Takahashi: Implementation of Parallel Number-Theoretic Transform on Manycore Clusters, SIAM Conference on Computational Science and Engineering (CSE23), RAI Congress Centre, Amsterdam, The Netherlands, February 27, 2023.
Daisuke Takahashi: Parallel Implementation of FFT in a Finite Field, SIAM Conference on Parallel Processing for Scientific Computing (PP22), Online, February 26, 2022.
Daisuke Takahashi: Automatic Tuning of Computation-Communication Overlap for Parallel 3-D FFT with 2-D Decomposition, SIAM Conference on Computational Science and Engineering (CSE21), Online, March 4, 2021.
Daisuke Takahashi: Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Intel Xeon Phi Clusters, SIAM Conference on Parallel Processing for Scientific Computing (PP20), Hyatt Regency Seattle, Seattle, Washington, USA, February 14, 2020.
Daisuke Takahashi: Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Intel Xeon Phi Clusters, SIAM Conference on Computational Science and Engineering (CSE19), Spokane Convention Center, Spokane, Washington, USA, March 1, 2019.
Daisuke Takahashi: Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors, 2018 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2018 ATAT in HPSC), National Cheng Kung University, Tainan, Taiwan, March 27, 2018.
Ayumu Gomi and Daisuke Takahashi: A Programming Framework for Performance Tuning in Julia, SIAM Conference on Parallel Processing for Scientific Computing (PP18), Waseda University, Shinjuku-ku, Tokyo, Japan, March 7, 2018.
Daisuke Takahashi: Implementation of Parallel FFTs on Cluster of Intel Xeon Phi Processors, SIAM Conference on Parallel Processing for Scientific Computing (PP18), Waseda University, Shinjuku-ku, Tokyo, Japan, March 7, 2018.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Cluster of Intel Xeon Phi processors, 2017 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2017 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, March 10, 2017.
Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi: Implementation Techniques for High Performance BLAS Kernels on Modern GPUs, SIAM Conference on Computational Science and Engineering (CSE17), Hilton Atlanta, Atlanta, Georgia, USA, February 28, 2017.
Daisuke Takahashi: Implementation of Parallel FFTs on Knights Landing Cluster, SIAM Conference on Computational Science and Engineering (CSE17), Hilton Atlanta, Atlanta, Georgia, USA, February 28, 2017.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Intel Xeon Phi Clusters, SIAM Conference on Parallel Processing for Scientific Computing (PP16), Universite Pierre et Marie Curie, Cordeliers Campus, Paris, France, April 14, 2016.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Intel Xeon Phi Clusters, 2016 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2016 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, February 19, 2016.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on GPU Clusters, 2015 SIAM Conference on Computational Science and Engineering (CSE15), Salt Palace Convention Center, Salt Lake City, Utah, USA, March 18, 2015.
Hiroshi Maeda and Daisuke Takahashi: Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster, 2015 SIAM Conference on Computational Science and Engineering (CSE15), Salt Palace Convention Center, Salt Lake City, Utah, USA, March 14, 2015.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on GPU Clusters, 2015 Conference on Advanced Topics and Auto Tuning in High-Performance and Scientific Computing (2015 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, February 28, 2015.
Daisuke Takahashi: Implementation of Parallel FFTs on GPU Clusters, 2014 Conference on Advanced Topics and Auto Tuning in High Performance and Scientific Computing (2014 ATAT in HPSC), National Taiwan University, Taipei, Taiwan, March 14, 2014.
Daisuke Takahashi: Experience of Implementing Parallel FFTs on GPU Clusters, Special Session: Legacy HPC Application Migration 2013 (LHAM) (held in conjunction with IEEE MCSoC-13), National Institute of Informatics, Chiyoda-ku, Tokyo, Japan, September 27, 2013.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs, 2013 Conference on Advanced Topics and Auto Tuning in High Performance and Scientific Computing (2013@^2HPSC), National Taiwan University, Taipei, Taiwan, March 28, 2013.
Daichi Mukunoki and Daisuke Takahashi: Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs, 2013 SIAM Conference on Computational Science and Engineering (CSE13), The Westin Boston Waterfront, Boston, Massachusetts, USA, February 28, 2013.
Daisuke Takahashi, Alex Yee, Torsten Hoefler, Camille Coti, Jeongnim Kim, and Franck Cappello: An Implementation of Parallel 3-D FFT with 1.5-D Decomposition, The seventh workshop of the INRIA-Illinois-ANL Joint Laboratory on Petascale Computing, INRIA Rennes, France, June 14, 2012.
Daisuke Takahashi, Alex Yee, Torsten Hoefler, Camille Coti, Jeongnim Kim, and Franck Cappello: A Scalable Parallel Algorithm for 3-D FFT, The sixth workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing, National Center for Supercomputing Applications, Urbana, Illinois, USA, November 22, 2011.
Yuji Kubota and Daisuke Takahashi: Autotuning of Sparse Matrix-Vector Multiplication by Selecting Storage Schemes on GPU, 2011 SIAM Conference on Computational Science and Engineering (CSE11), Grand Sierra Resort and Casino, Reno, Nevada, USA, March 1, 2011.
Daisuke Takahashi, Camille Coti, and Franck Cappello: Optimization of a Parallel 3-D FFT with 2-D Decomposition, The fourth workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing, National Center for Supercomputing Applications, Urbana, Illinois, USA, November 23, 2010.
Daisuke Takahashi: Automatic Tuning for Parallel 3-D FFTs, 2010 SIAM Annual Meeting (AN10), David L. Lawrence Convention Center, Pittsburgh, Pennsylvania, USA, July 16, 2010.
Daisuke Takahashi: Automatic Tuning for Parallel 3-D FFT with 2-D Decomposition, 2010 SIAM Conference on Parallel Processing for Scientific Computing (PP10), Grand Hyatt Seattle, Seattle, Washington, USA, February 25, 2010.
Daisuke Takahashi: A Volumetric 3-D FFT on Clusters of Multi-Core Processors, Third French-Japanese PAAP Workshop, Shiran-Kaikan Hall Annex, Kyoto, Japan, April 21, 2009.
Daisuke Takahashi: A Volumetric 3-D FFT on Clusters of Multi-Core Processors, 2009 SIAM Conference on Computational Science and Engineering (CSE09), Miami Hilton Downtown, Miami, Florida, USA, March 5, 2009.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs, Second French-Japanese PAAP Workshop, ENSEEIHT-IRIT, Toulouse, France, June 24, 2008.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs, 13th SIAM Conference on Parallel Processing for Scientific Computing (PP08), The Renaissance Atlanta Hotel Downtown, Atlanta, Georgia, USA, March 12, 2008.
Daisuke Takahashi: The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite, First French-Japanese PAAP Workshop, Next-Generation Supercomputer R&D Center, RIKEN, Chiyoda-ku, Tokyo, Japan, November 2, 2007.

5. Invited Talk

Daisuke Takahashi: Implementation of Parallel 3-D FFT with 2-D Decomposition on GPU Clusters, International Conference on Modern Mathematical Methods and High-Performance Computing in Science & Technology (M3HPCST-2026), GL Bajaj Group of Institutions, Mathura, India, January 28, 2026.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Cluster of Intel Xeon Phi Processors, Parallel Fast Fourier Transforms (PFFT) (held in conjunction with IEEE HiPC 2018), Radisson Blu Bengaluru Outer Ring Road, Bengaluru, India, December 17, 2018.
Daisuke Takahashi: Sparse Matrix-Vector Multiplication on GPUs, International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA2015), Tsukuba International Congress Center, Tsukuba, Japan, September 14, 2015.
Daisuke Takahashi: Automatic Tuning for Parallel FFTs on Clusters of Multi-Core Processors, Special Session: Auto-Tuning for Multicore and GPU (ATMG) (held in conjunction with IEEE MCSoC-12), The University of Aizu, Aizu, Japan, September 22, 2012.
Daisuke Takahashi: Parallel Implementation of Multiple-Precision Arithmetic and 2.576 Trillion Digits of Pi Calculation on a Massively Parallel Cluster of Multi-Core Processors, Workshop on Ultra Performance and Dependable Acceleration Systems (held in conjunction with PDCAT'09), Gakushi-kaikan, Hiroshima University, Higashi-Hiroshima, Japan, December 11, 2009.

6. Book

Daisuke Takahashi: Fast Fourier Transform Algorithms for Parallel Computers, Springer (2019).

7. Chapter in Book

Daisuke Takahashi: Fast Fourier Transform in Large-Scale Systems, Masaaki Geshi (Ed.): The Art of High Performance Computing for Computational Science, Vol. 1, Springer, pp. 137-168 (2019).
Taisuke Boku, Osamu Tatebe, Daisuke Takahashi, Kazuhiro Yabana, Yuta Hirokawa, Masayuki Umemura, Toshihiro Hanawa, Kengo Nakajima, Hiroshi Nakamura, Tsuyoshi Ichimura, Kohei Fujita, Yutaka Ishikawa, Mitsuhisa Sato, Balazs Gerofi, and Masamichi Takagi: Oakforest-PACS: Advanced KNL Cluster System, Jeffrey S. Vetter (Ed.): Contemporary High Performance Computing: From Petascale toward Exascale, Vol. 3, CRC Press, pp. 401-421 (2019).
Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, and Ryusuke Egawa: Xevolver: A User-Defined Code Transformation Approach to Streamlining Legacy Code Migration, Mitsuhisa Sato (Ed.): Advanced Software Technologies for Post-Peta Scale Computing, Springer, pp. 163-181 (2019).
Daisuke Takahashi: Automatic Tuning for Parallel FFTs, Ken Naono, Keita Teranishi, John Cavazos, and Reiji Suda (Eds.): Software Automatic Tuning: From Concepts to State-of-the-Art Results, Springer, pp. 49-67 (2010).
Daisuke Takahashi: Implementation of Multiple-Precision Parallel Division and Square Root on Distributed-Memory Parallel Computers, Yi Pan and Laurence T. Yang (Eds.): Parallel and Distributed Scientific and Engineering Computing: Practice and Experience, Nova Science Publishers, pp. 35-49 (2004).