Publications
1. Journal Paper
- Takuya Edamatsu and Daisuke Takahashi: Fast
Multiple-Precision Integer Division Using Intel AVX-512, IEEE
Transactions on Emerging Topics in Computing, Vol. 11, No. 1,
pp. 224-236 (2023).
- Daisuke Takahashi: On the use of Montgomery multiplication in
the computation of binary BBP-type formulas for mathematical
constants, The Ramanujan Journal, Vol. 59, No. 1, pp. 211-219 (2022).
- Yukimasa Sugizaki and Daisuke Takahashi: A Fast Algorithm for
Computing the Number of Magic Series, Annals of Combinatorics,
Vol. 26, No. 2, pp. 511-532 (2022).
- Kazuhiko Komatsu, Ayumu Gomi, Ryusuke Egawa, Daisuke
Takahashi, Reiji Suda, and Hiroyuki Takizawa: Xevolver: A code
transformation framework for separation of system-awareness from
application codes, Concurrency and Computation: Practice and
Experience, Vol. 32, No. 7, e5577 (2020).
- Daisuke Takahashi: On the computation and verification of ƒÎ
using BBP-type formulas, The Ramanujan Journal, Vol. 51, No. 1,
pp. 177-186 (2020).
- Takahiro Katagiri and Daisuke Takahashi: Japanese Autotuning
Research: Autotuning Languages and FFT, Proceedings of the IEEE,
Vol. 106, No. 11, pp. 2056-2067 (2018). (invited paper)
- Daisuke Takahashi: Computation of the 100 quadrillionth
hexadecimal digit of ƒÎ on a cluster of Intel Xeon Phi processors,
Parallel Computing, Vol. 75, pp. 1-10 (2018).
- Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke
Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Hikaru Inoue,
Yoshito Kitazawa, Ikuo Miyoshi, and Mitsuo Yokokawa: Performance
evaluation of ultra-large-scale first-principles electronic structure
calculation code on the K computer, International Journal of High
Performance Computing Applications, Vol. 28, No. 3, pp. 335-355
(2014).
- Yutaka Maruyama, Norio Yoshida, Hiroto Tadano, Daisuke
Takahashi, Mitsuhisa Sato, and Fumio Hirata: Massively parallel
implementation of 3D-RISM calculation with volumetric 3D-FFT, Journal
of Computational Chemistry, Vol. 35, No. 18, pp. 1347-1355 (2014).
- Yohei Miki, Daisuke Takahashi, and Masao Mori: Highly
scalable implementation of an N-body code on a GPU cluster, Computer
Physics Communications, Vol. 184, No. 9, pp. 2159-2168 (2013).
- Daisuke Takahashi: Parallel implementation of
multiple-precision arithmetic and 2,576,980,370,000 decimal digits of
ƒÎ calculation, Parallel Computing, Vol. 36, No. 8, pp. 439-448
(2010).
- Yoshikuni Sato, Daisuke Takahashi, and Reijer Grimbergen: A
Shogi Program Based on Monte-Carlo Tree Search, ICGA Journal, Vol. 33,
No. 2, pp. 80-92 (2010).
- Jun-Ichi Iwata, Daisuke Takahashi, Atsushi Oshiyama, Taisuke
Boku, Kenji Shiraishi, Susumu Okada, and Kazuhiro Yabana: A
massively-parallel electronic-structure calculations based on
real-space density functional theory, Journal of Computational
Physics, Vol. 229, No. 6, pp. 2339-2363 (2010).
- Tetsuya Sakurai, Yoshihisa Kodaki, Hiroto Tadano, Daisuke
Takahashi, Mitsuhisa Sato, and Umpei Nagashima: A parallel method for
large sparse generalized eigenvalue problems using a GridRPC system,
Future Generation Computer Systems, Vol. 24, No. 6, pp. 613-619
(2008).
- Taisuke Boku, Hajime Susa, Kenji Onuma, Masayuki Umemura,
Mitsuhisa Sato, and Daisuke Takahashi: Formation of Dwarf Galaxies in
Reionized Universe with Heterogeneous Multicomputer System,
International Journal for Multiscale Computational Engineering,
Vol. 4, No. 2, pp. 281-289 (2006).
- Daisuke Takahashi: An algorithm for multiple-precision
floating-point multiplication, Applied Mathematics and Computation,
Vol. 166, No. 2, pp. 291-298 (2005).
- Daisuke Takahashi: A parallel 1-D FFT algorithm for the
Hitachi SR8000, Parallel Computing, Vol. 29, No. 6, pp. 679-690
(2003).
- Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku:
Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001
Benchmarks, International Journal of Parallel Programming, Vol. 31,
No. 3, pp. 185-196 (2003).
- Daisuke Takahashi: Efficient implementation of parallel
three-dimensional FFT on clusters of PCs, Computer Physics
Communications, Vol. 152, No. 2, pp. 144-150 (2003).
- Daisuke Takahashi: An Extended Split-Radix FFT Algorithm,
IEEE Signal Processing Letters, Vol. 8, No. 5, pp. 145-147 (2001).
- Daisuke Takahashi: A fast algorithm for computing large
Fibonacci numbers, Information Processing Letters, Vol. 75, No. 6,
pp. 243-246 (2000).
- Daisuke Takahashi and Yasumasa Kanada: High-Performance
Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for
Distributed-Memory Parallel Computers, The Journal of Supercomputing,
Vol. 15, No. 2, pp. 207-228 (2000).
2. Conference Proceedings (with review)
- Daisuke Takahashi: Parallel Implementation of
Number-Theoretic Transform on GPU Clusters, Proc. 24th International
Conference on Algorithms and Architectures for Parallel Processing
(ICA3PP 2024), Part III, Lecture Notes in Computer Science,
Vol. 15253, pp. 204-218, Springer (2025).
- Daisuke Takahashi: On the Division in the Computation of
Binary BBP-Type Formulas for Mathematical Constants, Proc. 4th
International Conference on Numerical Computations: Theory and
Algorithms (NUMTA 2023), Part II, Lecture Notes in Computer Science,
Vol. 14477, pp. 323-330, Springer (2025). (short paper)
- Shota Kawakami and Daisuke Takahashi: Implementation and
Evaluation of Octuple-Precision Fast Fourier Transform on GPU,
Proc. 2024 IEEE International Symposium on Parallel and Distributed
Processing with Applications (ISPA 2024), pp. 287-294 (2024).
- Toshihiro Hanawa, Kengo Nakajima, Yohei Miki, Takashi
Shimokawabe, Kazuya Yamazaki, Shinji Sumimoto, Osamu Tatebe, Taisuke
Boku, Daisuke Takahashi, Akira Nukada, Norihisa Fujita, Ryohei
Kobayashi, Hiroto Tadano, and Akira Naruse: Preliminary Performance
Evaluation of Grace-Hopper GH200, Proc. 2024 IEEE International
Conference on Cluster Computing Workshops (CLUSTER Workshops 2024),
pp. 184-185 (2024). (poster paper)
- Daisuke Takahashi: Multiple Integer Divisions with an
Invariant Dividend and Monotonically Increasing or Decreasing
Divisors, Proc. 23rd International Conference on Computational Science
and Its Applications (ICCSA 2023), Part II, Lecture Notes in Computer
Science, Vol. 13957, pp. 393-401, Springer (2023). (short paper)
- Takuya Edamatsu and Daisuke Takahashi: Efficient Large
Integer Multiplication with Arm SVE Instructions, Proc. International
Conference on High Performance Computing in Asia-Pacific Region
(HPC Asia 2023), pp. 9-17 (2023).
- Daisuke Takahashi: An Implementation of Parallel
Number-Theoretic Transform Using Intel AVX-512 Instructions,
Proc. 24th International Workshop on Computer Algebra in Scientific
Computing (CASC 2022), Lecture Notes in Computer Science, Vol. 13366,
pp. 318-332, Springer (2022).
- Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki
Imamura, and Daisuke Takahashi: A Rapid Euclidean Norm Calculation
Algorithm that Reduces Overflow and Underflow, Proc. 21st
International Conference on Computational Science and Its Applications
(ICCSA 2021), Part I, Lecture Notes in Computer Science, Vol. 12949,
pp. 95-110, Springer (2021).
- Naruya Kitai, Daisuke Takahashi, Franz Franchetti, Takahiro
Katagiri, Satoshi Ohshima, and Toru Nagai: An Auto-tuning with
Adaptation of A64 Scalable Vector Extension for SPIRAL, Proc. 2021
IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW 2021), The 16th International Workshop on Automatic
Performance Tuning (iWAPT 2021), pp. 789-797 (2021).
- Daisuke Takahashi: Fast Multiple Montgomery Multiplications
Using Intel AVX-512IFMA Instructions, Proc. 20th International
Conference on Computational Science and Its Applications (ICCSA 2020),
Part V, Lecture Notes in Computer Science, Vol. 12253, pp. 655-663,
Springer (2020). (short paper)
- Yukimasa Sugizaki and Daisuke Takahashi: Fast Computation of
the Exact Number of Magic Series with an Improved Montgomery
Multiplication Algorithm, Proc. 20th International Conference on
Algorithms and Architectures for Parallel Processing (ICA3PP 2020),
Part II, Lecture Notes in Computer Science, Vol. 12453, pp. 365-382,
Springer (2020).
- Daisuke Takahashi: Implementation of Parallel 3-D Real FFT
with 2-D Decomposition on Intel Xeon Phi Clusters, Proc. 13th
International Conference on Parallel Processing and Applied
Mathematics (PPAM 2019), Part I, Lecture Notes in Computer Science,
Vol. 12043, pp. 151-161, Springer (2020).
- Takuya Edamatsu and Daisuke Takahashi: Accelerating Large
Integer Multiplication Using Intel AVX-512IFMA, Proc. 19th
International Conference on Algorithms and Architectures for Parallel
Processing (ICA3PP 2019), Part I, Lecture Notes in Computer Science,
Vol. 11944, pp. 60-74, Springer (2020).
- Daisuke Takahashi and Franz Franchetti: FFTE on SVE:
SPIRAL-Generated Kernels, Proc. International Conference on High
Performance Computing in Asia-Pacific Region (HPC Asia 2020),
pp. 114-122 (2020).
- Samar Aseeri, Benson K. Muite, and Daisuke Takahashi:
Reproducibility in Benchmarking Parallel Fast Fourier Transform based
Applications, Companion of the 2019 ACM/SPEC International Conference
on Performance Engineering (ICPE'19), pp. 5-8 (2019). (vision paper)
- Takuya Edamatsu and Daisuke Takahashi: Acceleration of Large
Integer Multiplication with Intel AVX-512 Instructions, Proc. 20th
IEEE International Conference on High Performance Computing and
Communications (HPCC-2018), pp. 211-218 (2018).
- Daisuke Takahashi: An Implementation of Parallel 1-D Real FFT
on Intel Xeon Phi Processors, Proc. 17th International Conference on
Computational Science and Its Applications (ICCSA 2017), Part I,
Lecture Notes in Computer Science, Vol. 10404, pp. 401-410, Springer
(2017).
- Hiroyuki Takizawa, Daichi Sato, Shoichi Hirasawa, and Daisuke
Takahashi: A Customizable Auto-Tuning Scenario with User-defined Code
Transformations, Proc. 2017 IEEE International Parallel and
Distributed Processing Symposium Workshops (IPDPSW 2017), The 12th
International Workshop on Automatic Performance Tuning (iWAPT 2017),
pp. 1372-1378 (2017).
- Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi:
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels
on GPUs, Proc. 2016 IEEE 10th International Symposium on Embedded
Multicore/Many-core Systems-on-Chip (MCSoC-16), Special Session:
Auto-Tuning for Multicore and GPU (ATMG), pp. 377-384 (2016).
- Daisuke Takahashi: Automatic Tuning of
Computation-Communication Overlap for Parallel 1-D FFT, Proc. 2016
IEEE 19th International Conference on Computational Science and
Engineering (CSE 2016), pp. 253-256 (2016). (short paper)
- Daisuke Takahashi: Implementation of Multiple-Precision
Floating-Point Arithmetic on Intel Xeon Phi Coprocessors, Proc. 16th
International Conference on Computational Science and Its Applications
(ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787,
pp. 60-70, Springer (2016).
- Hiroshi Maeda and Daisuke Takahashi: Parallel Sparse
Matrix-Vector Multiplication Using Accelerators, Proc. 16th
International Conference on Computational Science and Its Applications
(ICCSA 2016), Part II, Lecture Notes in Computer Science, Vol. 9787,
pp. 3-18, Springer (2016).
- Hiroshi Maeda and Daisuke Takahashi: Performance Evaluation
of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster,
Proc. 2015 Third International Symposium on Computing and Networking
(CANDAR'15), 3rd International Workshop on Computer Systems and
Architectures (CSA'15), pp. 396-399 (2015). (poster paper)
- Daisuke Takahashi: An Implementation of Parallel 1-D FFT
Using AVX Instructions on Multi-Core Processors, Proc. 2012
International Workshop on Innovative Architecture for Future
Generation Processors and Systems (IWIA 2012), pp. 83-88 (2015).
- Daisuke Takahashi: Optimization of All-to-All Communication
on Multi-Core Cluster Systems, Proc. 2011 International Workshop on
Innovative Architecture for Future Generation Processors and Systems
(IWIA 2011), pp. 3-7 (2015).
- Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi:
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on
Kepler GPUs, Proc. 23rd Euromicro International Conference on
Parallel, Distributed and Network-based Processing (PDP 2015),
pp. 642-650 (2015).
- Daichi Mukunoki and Daisuke Takahashi: Using Quadruple
Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs,
Proc. 10th International Conference on Parallel Processing and Applied
Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on
Hybrid Architectures, Lecture Notes in Computer Science, Vol. 8384,
pp. 632-642, Springer (2014).
- Takaaki Hiragushi and Daisuke Takahashi: Efficient Hybrid
Breadth-First Search on GPUs, Proc. 13th International Conference on
Algorithms and Architectures for Parallel Processing (ICA3PP 2013),
Part II, 2013 International Symposium on Advances of Distributed and
Parallel Computing (ADPC 2013), Lecture Notes in Computer Science,
Vol. 8286, pp. 40-50, Springer (2013).
- Daisuke Takahashi: Implementation of Parallel 1-D FFT on GPU
Clusters, Proc. 2013 IEEE 16th International Conference on
Computational Science and Engineering (CSE 2013), pp. 174-180 (2013).
- Yoshikuni Sato, Makoto Miwa, Shogo Takeuchi, and Daisuke
Takahashi: Optimizing Objective Function Parameters for Strength in
Computer Game-Playing, Proc. 27th AAAI Conference on Artificial
Intelligence (AAAI-13), pp. 869-875 (2013).
- Daichi Mukunoki and Daisuke Takahashi: Optimization of Sparse
Matrix-vector Multiplication for CRS Format on NVIDIA Kepler
Architecture GPUs, Proc. 13th International Conference on
Computational Science and Its Applications (ICCSA 2013), Part V,
Lecture Notes in Computer Science, Vol. 7975, pp. 211-223, Springer
(2013).
- Hiroki Yoshizawa and Daisuke Takahashi: Automatic Tuning of
Sparse Matrix-Vector Multiplication for CRS format on GPUs, Proc. 2012
IEEE 15th International Conference on Computational Science and
Engineering (CSE 2012), pp. 130-136 (2012).
- Daisuke Takahashi: An Implementation of Parallel 2-D FFT
Using Intel AVX Instructions on Multi-Core Processors,
Proc. 12th International Conference on Algorithms and Architectures
for Parallel Processing (ICA3PP 2012), Part II, Lecture Notes in
Computer Science, Vol. 7440, pp. 197-205, Springer (2012).
(short paper)
- Daisuke Takahashi, Atsuya Uno, and Mitsuo Yokokawa: An
Implementation of Parallel 1-D FFT on the K computer, Proc. 2012 IEEE
14th International Conference on High Performance Computing and
Communications (HPCC-2012), pp. 344-350 (2012).
- T. Boku, K.-I. Ishikawa, Y. Kuramashi, K. Minami,
Y. Nakamura, F. Shoji, D. Takahashi, M. Terai, A. Ukawa, and
T. Yoshie: Multi-block/multi-core SSOR preconditioner for the QCD
quark solver for K computer, Proceedings of Science, The 30th
International Symposium on Lattice Field Theory (Lattice 2012),
p. 188 (2012).
- Yohei Miki, Daisuke Takahashi, and Masao Mori: A Fast
Implementation and Performance Analysis of Collisionless N-body Code
Based on GPGPU, Proc. International Conference on Computational
Science (ICCS 2012), Procedia Computer Science, Vol. 9, pp. 96-105,
Elsevier (2012).
- Takuma Nomizu, Daisuke Takahashi, Jinpil Lee, Taisuke Boku,
and Mitsuhisa Sato: Implementation of XcalableMP Device Acceleration
Extention with OpenCL, Proc. 2012 IEEE 26th International Parallel and
Distributed Processing Symposium Workshops & PhD Forum (IPDPSW
2012), Multicore and GPU Programming Models, Languages and Compilers
Workshop (PLC 2012), pp. 2394-2403 (2012).
- Daichi Mukunoki and Daisuke Takahashi: Implementation and
Evaluation of Triple Precision BLAS Subroutines on GPUs, Proc. 2012
IEEE 26th International Parallel and Distributed Processing Symposium
Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel
and Distributed Scientific and Engineering Computing (PDSEC-12),
pp. 1378-1386 (2012).
- Daichi Mukunoki and Daisuke Takahashi: Implementation and
Evaluation of Quadruple Precision BLAS Functions on GPUs,
Proc. 10th International Conference on Applied Parallel and Scientific
Computing (PARA 2010), Part I, Lecture Notes in Computer Science,
Vol. 7133, pp. 249-259, Springer (2012).
- Takatoshi Nakayama and Daisuke Takahashi: Implementation of
Multiple-Precision Floating-Point Arithmetic Library for GPU
Computing, Proc. 23rd IASTED International Conference on Parallel and
Distributed Computing and Systems (PDCS 2011), pp. 343-349 (2011).
- Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke
Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Fumiyoshi
Shoji, Atsuya Uno, Motoyoshi Kurokawa, Hikaru Inoue, Ikuo Miyoshi, and
Mitsuo Yokokawa: First-principles calculations of electron states of a
silicon nanowire with 100,000 atoms on the K computer, Proc. 2011
ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis (SC'11) (2011).
- Yuji Kubota and Daisuke Takahashi: Optimization of Sparse
Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU,
Proc. 11th International Conference on Computational Science and Its
Applications (ICCSA 2011), Part II, Lecture Notes in Computer Science,
Vol. 6783, pp. 547-561, Springer (2011).
- Daisuke Takahashi: An Implementation of Parallel 3-D FFT with
2-D Decomposition on a Massively Parallel Cluster of Multi-core
Processors, Proc. 8th International Conference on Parallel Processing
and Applied Mathematics (PPAM 2009), Part I, Workshop on Memory Issues
on Multi- and Manycore Platforms, Lecture Notes in Computer Science,
Vol. 6067, pp. 606-614, Springer (2010).
- Chikafumi Takahashi, Mitsuhisa Sato, Daisuke Takahashi,
Taisuke Boku, Akira Ukawa, Hiroshi Nakamura, Hidetaka Aoki, Hideo
Sawamoto, and Naonobu Sukegawa: Design and Power Performance
Evaluation of On-Chip Memory Processor with Arithmetic Accelerators,
Proc. 2008 International Workshop on Innovative Architecture for
Future Generation High-Performance Processors and Systems (IWIA 2008),
pp. 51-57 (2009).
- Daisuke Takahashi: A Parallel Algorithm for
Multiple-Precision Division by a Single-Precision Integer, Proc. 6th
International Conference on Large-Scale Scientific Computations
(LSSC 2007), Lecture Notes in Computer Science, Vol. 4818,
pp. 729-736, Springer (2008).
- Chikafumi Takahashi, Mitsuhisa Sato, Daisuke Takahashi,
Taisuke Boku, Hiroshi Nakamura, Masaaki Kondo, and Motonobu Fujita:
Empirical Study for Optimization of Power-Performance with On-Chip
Memory, Proc. First International Workshop on Advanced Low Power
Systems (ALPS 2006), Lecture Notes in Computer Science, Vol. 4759,
pp. 466-479, Springer (2008).
- Daisuke Takahashi: Implementation and Evaluation of Parallel
FFT Using SIMD Instructions on Multi-Core Processors, Proc. 2007
International Workshop on Innovative Architecture for Future
Generation High-Performance Processors and Systems (IWIA 2007),
pp. 53-59 (2008).
- Daisuke Takahashi: An Implementation of Parallel 1-D FFT
Using SSE3 Instructions on Dual-Core Processors, Proc. 8th
International Workshop on State of the Art in Scientific Computing
(PARA 2006), Lecture Notes in Computer Science, Vol. 4699,
pp. 1178-1187, Springer (2007).
- Akira Nukada, Daisuke Takahashi, Reiji Suda, and Akira
Nishida: High Performance FFT on SGI Altix 3700, Proc. 3rd
International Conference on High Performance Computing and
Communications (HPCC 2007), Lecture Notes in Computer Science,
Vol. 4782, pp. 396-407, Springer (2007).
- Takayuki Imada, Mitsuhisa Sato, Yoshihiko Hotta, Hideaki
Kimura, Taisuke Boku, Daisuke Takahashi, Shinichi Miura, and Hiroshi
Nakashima: Power-performance Evaluation on Ultra-Low Power
High-performance Cluster System: MegaProto/E, Proc. IEEE Symposium on
Low-Power and High-Speed Chips (COOL Chips X), pp. 117-129 (2007).
- Takayuki Okamoto, Shinichi Miura, Taisuke Boku, Mitsuhisa
Sato, and Daisuke Takahashi: RI2N/UDP: High bandwidth and
fault-tolerant network for PC-cluster based on multi-link Ethernet,
Proc. 21th IEEE International Parallel and Distributed Processing
Symposium (IPDPS 2007), The Workshop on Communication Architecture for
Clusters (CAC 2007) (2007).
- Hideaki Kimura, Mitsuhisa Sato, Yoshihiko Hotta, Taisuke
Boku, and Daisuke Takahashi: Empirical Study on Reducing Energy of
Parallel Programs using Slack Reclamation by DVFS, Proc. 2006 IEEE
International Conference on Cluster Computing (Cluster 2006),
pp. 1-10 (2006).
- Taisuke Boku, Mitsuhisa Sato, Akira Ukawa, Daisuke Takahashi,
Shinji Sumimoto, Kouichi Kumon, Takashi Moriyama, and Masaaki Shimizu:
PACS-CS: A large-scale bandwidth-aware PC cluster for scientific
computations, Proc. Sixth IEEE International Symposium on Cluster
Computing and the Grid (CCGRID'06), pp. 233-240 (2006).
- Daisuke Takahashi: A Hybrid MPI/OpenMP Implementation of a
Parallel 3-D FFT on SMP Clusters, Proc. 6th International Conference
on Parallel Processing and Applied Mathematics (PPAM 2005), Lecture
Notes in Computer Science, Vol. 3911, pp. 970-977, Springer (2006).
- Yoshiaki Aida, Yoshihiro Nakajima, Mitsuhisa Sato, Tetsuya
Sakurai, Daisuke Takahashi, and Taisuke Boku: Performance Improvement
by Data Management Layer in a Grid RPC System, Proc. First
International Conference on Grid and Pervasive Computing (GPC 2006),
Lecture Notes in Computer Science, Vol. 3947, pp. 324-335, Springer
(2006).
- Taisuke Boku, Mitsuhisa Sato, Daisuke Takahashi, Hiroshi
Nakashima, Hiroshi Nakamura, Satoshi Matsuoka, and Yoshihiko Hotta:
MegaProto/E: Power-Aware High-Performance Cluster with Commodity
Technology, Proc. 20th IEEE International Parallel and Distributed
Processing Symposium (IPDPS 2006), The Second Workshop on
High-Performance, Power-Aware Computing (HP-PAC 2006) (2006).
- Yoshihiko Hotta, Mitsuhisa Sato, Hideaki Kimura, Satoshi
Matsuoka, Taisuke Boku, and Daisuke Takahashi: Profile-based
Optimization of Power Performance by using Dynamic Voltage Scaling on
a PC cluster, Proc. 20th IEEE International Parallel and Distributed
Processing Symposium (IPDPS 2006), The Second Workshop on
High-Performance, Power-Aware Computing (HP-PAC 2006) (2006).
- Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: An
Implementation of Parallel 3-D FFT Using Short Vector SIMD
Instructions on Clusters of PCs, Proc. 7th International Workshop on
Applied Parallel Computing (PARA 2004), Lecture Notes in Computer
Science, Vol. 3732, pp. 1159-1167, Springer (2006).
- Tetsuya Sakurai, Kentaro Hayakawa, Mitsuhisa Sato, and
Daisuke Takahashi: A Parallel Method for Large Sparse Generalized
Eigenvalue Problems by OmniRPC in a Grid Environment, Proc. 7th
International Workshop on Applied Parallel Computing (PARA 2004),
Lecture Notes in Computer Science, Vol. 3732, pp. 1151-1158, Springer
(2006).
- Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku:
Computation of High-Precision Mathematical Constants in a Combined
Cluster and Grid Environment, Proc. 5th International Conference on
Large-Scale Scientific Computations (LSSC 2005), Lecture Notes in
Computer Science, Vol. 3743, pp. 454-461, Springer (2006).
- Yoshinori Ojima, Mitsuhisa Sato, Taisuke Boku, and Daisuke
Takahashi: Design of a Software Distributed Shared Memory System using
an MPI communication layer, Proc. 8th International Symposium on
Parallel Architectures, Algorithms, and Networks (I-SPAN 2005),
pp. 220-229 (2005).
- Shinichi Miura, Takayuki Okamoto, Taisuke Boku, Mitsuhisa
Sato, and Daisuke Takahashi: Low-cost High-bandwidth Tree Network for
PC Clusters based on Tagged-VLAN Technology, Proc. 8th International
Symposium on Parallel Architectures, Algorithms, and Networks
(I-SPAN 2005), pp. 84-93 (2005).
- Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke
Boku, Satoshi Matsuoka, Daisuke Takahashi, and Yoshihiko Hotta:
MegaProto: 1TFlops/10kW Rack Is Feasible Even with Only Commodity
Technology, Proc. 2005 ACM/IEEE Conference on Supercomputing (SC|05)
(2005).
- Hiroshi Nakashima, Hiroshi Nakamura, Mitsuhisa Sato, Taisuke
Boku, Satoshi Matsuoka, Daisuke Takahashi, and Yoshihiko Hotta:
MegaProto: A Low-Power and Compact Cluster for High-Performance
Computing, Proc. 19th IEEE International Parallel and Distributed
Processing Symposium (IPDPS'05), Workshop on High Performance,
Power-Aware Computing (HPPAC) (2005).
- Taisuke Boku, Kenji Onuma, Mitsuhisa Sato, Yoshihiro
Nakajima, and Daisuke Takahashi: Grid environment for computational
astrophysics driven by GRAPE-6 with HMCS-G and OmniRPC, Proc. 19th
IEEE International Parallel and Distributed Processing Symposium
(IPDPS'05), Joint Workshop on High-Performance Grid Computing &
High-Level Parallel Programming Models (HIPS-HPGC) (2005).
- Yoshinori Ojima, Mitsuhisa Sato, Taisuke Boku, and Daisuke
Takahashi: Design of Software Distributed Shared Memory System using
MPI communication layer, Proc. 4th International Workshop on OpenMP:
Experiences and Implementations (WOMPEI 2005), pp. 18-25 (2005).
- Taisuke Boku, Mitsuhisa Sato, Masazumi Matsubara, and Daisuke
Takahashi: OpenMPI --- OpenMP like tool for easy programming in MPI,
Proc. 6th European Workshop on OpenMP (EWOMP 2004), pp. 83-88 (2004).
- Yoshihiro Nakajima, Mitsuhisa Sato, Hitoshi Goto, Taisuke
Boku, and Daisuke Takahashi: Implementation and Performance Evaluation
of CONFLEX-G: Grid-enabled Molecular Conformational Space Search
Program with OmniRPC, Proc. 18th International Conference on
Supercomputing (ICS'04), pp. 154-163 (2004).
- Chikafumi Takahashi, Masaaki Kondo, Taisuke Boku, Daisuke
Takahashi, Hiroshi Nakamura, and Mitsuhisa Sato: SCIMA-SMP: on-chip
memory processor architecture for SMP, Proc. 3rd Workshop on Memory
Performance Issues (WMPI'04), pp. 121-128 (2004).
- Taisuke Boku, Hajime Susa, Kenji Onuma, Masayuki Umemura,
Mitsuhisa Sato, and Daisuke Takahashi: Formation of Dwarf Galaxies in
Reionized Universe with Heterogeneous Multi-Computer System,
Proc. International Conference on Computational Science 2004
(ICCS 2004), Part IV, Workshop on Modeling and Simulation of
Multi-physics Multi-scale Systems, Lecture Notes in Computer Science,
Vol. 3039, pp. 629-636, Springer (2004).
- Yuhsuke Ohtaki, Daisuke Takahashi, Taisuke Boku, and
Mitsuhisa Sato: Parallel Implementation of Strassen's Matrix
Multiplication Algorithm for Heterogeneous Clusters, Proc. 18th
International Parallel and Distributed Processing Symposium
(IPDPS'04), The 13th Heterogeneous Computing Workshop (HCW 2004)
(2004).
- Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Daisuke
Takahashi, and Chikafumi Takahashi: Measurement and Characterization
of Power Consumption of Microprocessors for Power-aware Cluster,
Proc. An International Symposium on Low-Power and High-Speed Chips
(COOL Chips VII), pp. 293-303 (2004).
- Yoshihiro Nakajima, Mitsuhisa Sato, Taisuke Boku, Daisuke
Takahashi, and Hitoshi Gotoh: Performance Evaluation of OmniRPC in a
Grid Environment, Proc. 2004 International Symposium on Applications
and the Internet Workshops (SAINT 2004 Workshops), pp. 658-664
(2004).
- Kenji Onuma, Taisuke Boku, Mitsuhisa Sato, Daisuke
Takahashi, Hajime Susa, and Masayuki Umemura: Heterogeneous Remote
Computing System for Computational Astrophysics with OmniRPC,
Proc. 2004 International Symposium on Applications and the Internet
Workshops (SAINT 2004 Workshops), pp. 623-629 (2004).
- Shinichi Miura, Taisuke Boku, Mitsuhisa Sato, and Daisuke
Takahashi: RI2N --- Interconnection Network System for Clusters with
Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links, Proc. 5th
International Symposium on High Performance Computing (ISHPC 2003),
Lecture Notes in Computer Science, Vol. 2858, pp. 342-351, Springer
(2003).
- Daisuke Takahashi: A Radix-16 FFT Algorithm Suitable for
Multiply-Add Instruction Based on Goedecker Method, Proc. 2003 IEEE
International Conference on Multimedia and Expo (ICME 2003), Vol. 2,
pp. 845-848 (2003). (poster paper)
- Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku: An
OpenMP Implementation of Parallel FFT and Its Performance on IA-64
Processors, Proc. International Workshop on OpenMP Applications and
Tools (WOMPAT 2003), Lecture Notes in Computer Science, Vol. 2716,
pp. 99-108, Springer (2003).
- Taisuke Boku, Mitsuhisa Sato, Kenji Onuma, Junichiro
Makino, Hajime Susa, Daisuke Takahashi, Masayuki Umemura, and Akira
Ukawa: HMCS-G: Grid-enabled Hybrid Computing System for Computational
Astrophysics, Proc. 3rd IEEE/ACM International Symposium on Cluster
Computing and the Grid (CCGRID'03), Workshop on Grids and Advanced
Networks (GAN'03), pp. 558-565 (2003).
- Mitsuhisa Sato, Taisuke Boku, and Daisuke Takahashi:
OmniRPC: a Grid RPC System for Parallel Programming in Cluster and
Grid Environment, Proc. 3rd IEEE/ACM International Symposium on
Cluster Computing and the Grid (CCGRID'03), pp. 206-213 (2003).
- Daisuke Takahashi: A Radix-16 FFT Algorithm Suitable for
Multiply-Add Instruction Based on Goedecker Method, Proc. 2003 IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2003), Vol. 2, pp. 665-668 (2003). (poster paper)
- Shinsuke Nara, Yuichi Goto, Daisuke Takahashi, and Jingde
Cheng: Parallel Forward Deduction System for General-Purpose
Entailment Calculus on Clusters of PCs, Proc. IASTED International
Conference on Networks, Parallel and Distributed Processing, and
Applications (NPDPA 2002), pp. 359-364 (2002).
- Yuichi Goto, Daisuke Takahashi, and Jingde Cheng: Improving
Performance of Automated Forward Deduction System EnCal on
Shared-Memory Parallel Computers, Proc. Third International
Conference on Parallel and Distributed Computing, Applications and
Technologies (PDCAT 2002), pp. 63-68 (2002).
- Daisuke Takahashi, Taisuke Boku, and Mitsuhisa Sato: A
Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs, Proc. 8th
International Euro-Par Conference (Euro-Par 2002), Lecture Notes in
Computer Science, Vol. 2400, pp. 691-700, Springer (2002).
- Daisuke Takahashi: A Blocking Algorithm for Parallel 1-D
FFT on Shared-Memory Parallel Computers, Proc. 6th International
Conference on Applied Parallel Computing (PARA 2002), Lecture Notes in
Computer Science, Vol. 2367, pp. 380-389, Springer (2002).
- Daisuke Takahashi, Mitsuhisa Sato, and Taisuke Boku:
Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks,
Proc. 4th International Symposium on High Performance Computing
(ISHPC 2002), Lecture Notes in Computer Science, Vol. 2327,
pp. 390-400, Springer (2002).
- Yuichi Goto, Daisuke Takahashi, and Jingde Cheng: Parallel
Forward Deduction Algorithms of General-Purpose Entailment Calculus on
Shared-Memory Parallel Computers, Proc. 2nd International Conference
on Software Engineering, Artificial Intelligence, Networking &
Parallel/Distributed Computing (SNPD'01), pp. 168-175 (2001).
- Daisuke Takahashi: A Blocking Algorithm for FFT on
Cache-Based Processors, Proc. 9th International Conference on High
Performance Computing and Networking Europe (HPCN Europe 2001),
Lecture Notes in Computer Science, Vol. 2110, pp. 551-554, Springer
(2001). (poster paper)
- Daisuke Takahashi: A Mixed-Radix Parallel Three-Dimensional
FFT Algorithm on Clusters of Vector SMPs, Proc. Tenth SIAM Conference
on Parallel Processing for Scientific Computing (PP01) (2001).
- Seiji Nishimura, Daisuke Takahashi, Takaomi Shigehara,
Hiroshi Mizoguchi, and Taketoshi Mishima: A Performance Study on a
Single Processing Node of the HITACHI SR8000, Proc. Second
International Conference on Numerical Analysis and Its Applications
(NAA 2000), Lecture Notes in Computer Science, Vol. 1988,
pp. 628-635, Springer (2001).
- Daisuke Takahashi: A Parallel 3-D FFT Algorithm on
Clusters of Vector SMPs, Proc. 5th International Workshop on
Applied Parallel Computing (PARA 2000), Lecture Notes in Computer
Science, Vol. 1947, pp. 316-323, Springer (2001).
- Daisuke Takahashi: Implementation of Multiple-Precision
Parallel Division and Square Root on Distributed-Memory Parallel
Computers, Proc. 2000 International Workshop on Parallel
Processing (ICPP'00 Workshops), Workshop on High Performance
Scientific and Engineering Computing with Applications
(HPSECA-00), pp. 229-235 (2000).
- Seiji Nishimura, Daisuke Takahashi, Takaomi Shigehara,
Hiroshi Mizoguchi, and Taketoshi Mishima: Efficient Implementation of
CG & CR Methods for Linear Systems on a Single Processing Node of
HITACHI SR8000, Proc. 2000 International Technical Conference on
Circuits/Systems, Computers and Communications (ITC-CSCC2000), pp.
298-301 (2000).
- Daisuke Takahashi: A New Radix-6 FFT Algorithm Suitable for
Multiply-Add Instruction, Proc. 2000 IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP 2000), Vol. 6,
pp. 3343-3346 (2000). (poster paper)
- Daisuke Takahashi: High-Performance Parallel FFT Algorithms
for the HITACHI SR8000, Proc. Fourth International
Conference/Exhibition on High Performance Computing in Asia-Pacific
Region (HPC-Asia 2000), Vol. 1, pp. 192-199 (2000).
- Daisuke Takahashi and Yasumasa Kanada: Fast High-Precision
Arithmetic on Distributed Memory Parallel Machines, Proc. Ninth
SIAM Conference on Parallel Processing for Scientific Computing (PP99)
(1999).
3. Conference Proceedings (without review)
- Satoshi Matsuoka, William Kramer, and Daisuke Takahashi: The
HPC Decathlon Assessment Measure: A Proposal to Define a New Composite
Benchmark for High Performance Computing, Storage, Networking and
Analysis, Proc. Workshop on Modeling & Simulation of Exascale
Systems and Applications (MODSIM 2013) (2013). (position paper)
- Hiroyuki Takizawa, Ryusuke Egawa, Daisuke Takahashi, and
Reiji Suda: HPC Refactoring with Hierarchical Abstractions to Help
Software Evolution, Sustained Simulation Performance 2012: Proceedings
of the joint Workshop on High Performance Computing on Vector Systems,
Stuttgart (HLRS), and Workshop on Sustained Simulation Performance,
Tohoku University, 2012, pp. 27-33, Springer (2013).
- Daichi Mukunoki and Daisuke Takahashi: Performance Comparison
of Double, Triple and Quadruple Precision Real and Complex BLAS
Subroutines on GPUs, Proc. ATIP/A*CRC Workshop on Accelerator
Technologies for High-Performance Computing: Does Asia Lead the Way?
(ATIP/A*CRC Workshop '12), pp. 788-790 (2012).
- Piotr Luszczek, David H. Bailey, Jack Dongarra, Jeremy
Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi: The
HPC Challenge (HPCC) benchmark suite, Proc. 2006 ACM/IEEE Conference
on Supercomputing (SC'06) (2006).
- Takuya Yokozawa, Daisuke Takahashi, Taisuke Boku, and
Mitsuhisa Sato: Efficient Parallel Implementation of Classical
Gram-Schmidt Orthogonalization Using Matrix Multiplication, Proc. 4th
International Workshop on Parallel Matrix Algorithms and Applications
(PMAA'06), pp. 37-38 (2006).
- Hideaki Kimura, Mitsuhisa Sato, Yoshihiko Hotta, Taisuke
Boku, Daisuke Takahashi: Reducing Energy of Parallel Programs using
Slack Reclamation by DVFS in a Power-scalable High Performance
Cluster, Proc. IEEE Symposium on Low-Power and High-Speed Chips (COOL
Chips IX), p. 187 (2006).
- Mitsuhisa Sato, Yoshihiro Nakajima, Tetsuya Sakurai, Taisuke
Boku, and Daisuke Takahashi: OmniRPC Grid Parallel Programming
Environment for a Large Scale Numerical Computation, Proc. 17th IMACS
World Congress Scientific Computation, Applied Mathematics and
Simulation (2005).
- Mitsuhisa Sato, Yoshinori Ojima, Taisuke Boku, and Daisuke
Takahashi: Portable Software Distributed Shared Memory SCASH-MPI for
Omni OpenMP Compiler, Proc. First International Workshop on OpenMP
(IWOMP 2005) (2005).
- Taisuke Boku, Mitsuhisa Sato, Masazumi Matsubara, and
Daisuke Takahashi: OpenMPI --- OpenMP like tool for easy programming
in MPI, Proc. 6th European Workshop on OpenMP (EWOMP 2004), pp. 83-88
(2004).
- Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Hiroshi
Nakashima, Hiroshi Nakamura, Satoshi Matsuoka, Daisuke Takahashi,
Chikafumi Takahashi, Shinichi Miura, Yoshihiro Nakajima, Masaaki
Kondo, and Motonobu Fujita: MegaProto: A Prototype of Ultra Low-Power
Mega-Scale System, Proc. An International Symposium on Low-Power and
High-Speed Chips (COOL Chips VII), Vol. 1, p. 84 (2004).
- Yoshihiko Hotta, Mitsuhisa Sato, Taisuke Boku, Daisuke
Takahashi, and Chikafumi Takahashi: Measurement and Characterization
of Power Consumption of Microprocessors for Power-aware Computing,
Proc. An International Symposium on Low-Power and High-Speed Chips
(COOL Chips VI), Vol. 1, p. 77 (2003).
4. Oral Presentation
- Daisuke Takahashi: Implementation of Parallel 3-D Real FFT
with 2-D Decomposition on Manycore Clusters, The 14th AIMS Conference,
ADNEC Centre Abu Dhabi, Abu Dhabi, UAE, December 20, 2024.
- Daisuke Takahashi: Implementation of Parallel
Number-Theoretic Transform on GPU Clusters, SIAM Conference on
Parallel Processing for Scientific Computing (PP24), Lord Baltimore
Hotel, Baltimore, Maryland, USA, March 7, 2024.
- Daisuke Takahashi: Multiple Integer Divisions with an
Invariant Dividend, 10th International Congress on Industrial and
Applied Mathematics (ICIAM 2023), Waseda University, Tokyo, Japan,
August 21, 2023.
- Daisuke Takahashi: Implementation of Parallel
Number-Theoretic Transform on Manycore Clusters, SIAM Conference on
Computational Science and Engineering (CSE23), RAI Congress Centre,
Amsterdam, The Netherlands, February 27, 2023.
- Daisuke Takahashi: Parallel Implementation of FFT in a Finite
Field, SIAM Conference on Parallel Processing for Scientific Computing
(PP22), Online, February 26, 2022.
- Daisuke Takahashi: Automatic Tuning of
Computation-Communication Overlap for Parallel 3-D FFT with 2-D
Decomposition, SIAM Conference on Computational Science and
Engineering (CSE21), Online, March 4, 2021.
- Daisuke Takahashi: Implementation of Parallel 3-D Real FFT
with 2-D Decomposition on Intel Xeon Phi Clusters, SIAM Conference on
Parallel Processing for Scientific Computing (PP20), Hyatt Regency
Seattle, Seattle, Washington, USA, February 14, 2020.
- Daisuke Takahashi: Implementation of Parallel 3-D Real FFT
with 2-D Decomposition on Intel Xeon Phi Clusters, SIAM Conference on
Computational Science and Engineering (CSE19), Spokane Convention
Center, Spokane, Washington, USA, March 1, 2019.
- Daisuke Takahashi: Implementation of Parallel 1-D Real FFT
on Intel Xeon Phi Processors, 2018 Conference on Advanced Topics and
Auto Tuning in High-Performance and Scientific Computing
(2018 ATAT in HPSC), National Cheng Kung University, Tainan, Taiwan,
March 27, 2018.
- Ayumu Gomi and Daisuke Takahashi: A Programming Framework for
Performance Tuning in Julia, SIAM Conference on Parallel Processing
for Scientific Computing (PP18), Waseda University, Tokyo, Japan,
March 7, 2018.
- Daisuke Takahashi: Implementation of Parallel FFTs on Cluster
of Intel Xeon Phi Processors, SIAM Conference on Parallel Processing
for Scientific Computing (PP18), Waseda University, Tokyo, Japan,
March 7, 2018.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on
Cluster of Intel Xeon Phi processors, 2017 Conference on Advanced
Topics and Auto Tuning in High-Performance and Scientific Computing
(2017 ATAT in HPSC), National Taiwan University, Taipei, Taiwan,
March 10, 2017.
- Daichi Mukunoki, Toshiyuki Imamura, and Daisuke Takahashi:
Implementation Techniques for High Performance BLAS Kernels on Modern
GPUs, SIAM Conference on Computational Science and Engineering
(CSE17), Hilton Atlanta, Atlanta, Georgia, USA, February 28, 2017.
- Daisuke Takahashi: Implementation of Parallel FFTs on Knights
Landing Cluster, SIAM Conference on Computational Science and
Engineering (CSE17), Hilton Atlanta, Atlanta, Georgia, USA, February
28, 2017.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on
Intel Xeon Phi Clusters, SIAM Conference on Parallel Processing for
Scientific Computing (PP16), Universite Pierre et Marie Curie,
Cordeliers Campus, Paris, France, April 14, 2016.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on
Intel Xeon Phi Clusters, 2016 Conference on Advanced Topics and Auto
Tuning in High-Performance and Scientific Computing
(2016 ATAT in HPSC), National Taiwan University, Taipei, Taiwan,
February 19, 2016.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on GPU
Clusters, 2015 SIAM Conference on Computational Science and
Engineering (CSE15), Salt Palace Convention Center, Salt Lake City,
Utah, USA, March 18, 2015.
- Hiroshi Maeda and Daisuke Takahashi: Performance Evaluation
of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster, 2015
SIAM Conference on Computational Science and Engineering (CSE15), Salt
Palace Convention Center, Salt Lake City, Utah, USA, March 14, 2015.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on GPU
Clusters, 2015 Conference on Advanced Topics and Auto Tuning in
High-Performance and Scientific Computing (2015 ATAT in HPSC),
National Taiwan University, Taipei, Taiwan, February 28, 2015.
- Daisuke Takahashi: Implementation of Parallel FFTs on GPU
Clusters, 2014 Conference on Advanced Topics and Auto Tuning in High
Performance and Scientific Computing (2014 ATAT in HPSC), National
Taiwan University, Taipei, Taiwan, March 14, 2014.
- Daisuke Takahashi: Experience of Implementing Parallel FFTs
on GPU Clusters, Special Session: Legacy HPC Application Migration
2013 (LHAM) (held in conjunction with IEEE MCSoC-13), National
Institute of Informatics, Chiyoda-ku, Tokyo, Japan, September 27,
2013.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs, 2013
Conference on Advanced Topics and Auto Tuning in High Performance and
Scientific Computing (2013@^2HPSC), National Taiwan University,
Taipei, Taiwan, March 28, 2013.
- Daichi Mukunoki and Daisuke Takahashi: Iterative Method for
Sparse Linear Systems using Quadruple Precision Operations on GPUs,
2013 SIAM Conference on Computational Science and Engineering (CSE13),
The Westin Boston Waterfront, Boston, Massachusetts, USA, February 28,
2013.
- Daisuke Takahashi, Alex Yee, Torsten Hoefler, Camille Coti,
Jeongnim Kim, and Franck Cappello: An Implementation of Parallel 3-D
FFT with 1.5-D Decomposition, The seventh workshop of the
INRIA-Illinois-ANL Joint Laboratory on Petascale Computing, INRIA
Rennes, France, June 14, 2012.
- Daisuke Takahashi, Alex Yee, Torsten Hoefler, Camille Coti,
Jeongnim Kim, and Franck Cappello: A Scalable Parallel Algorithm for
3-D FFT, The sixth workshop of the INRIA-Illinois Joint Laboratory on
Petascale Computing, National Center for Supercomputing Applications,
Urbana, Illinois, USA, November 22, 2011.
- Yuji Kubota and Daisuke Takahashi: Autotuning of Sparse
Matrix-Vector Multiplication by Selecting Storage Schemes on GPU,
2011 SIAM Conference on Computational Science and Engineering (CSE11),
Grand Sierra Resort and Casino, Reno, Nevada, USA, March 1, 2011.
- Daisuke Takahashi, Camille Coti, and Franck Cappello:
Optimization of a Parallel 3-D FFT with 2-D Decomposition, The fourth
workshop of the INRIA-Illinois Joint Laboratory on Petascale
Computing, National Center for Supercomputing Applications, Urbana,
Illinois, USA, November 23, 2010.
- Daisuke Takahashi: Automatic Tuning for Parallel 3-D FFTs,
2010 SIAM Annual Meeting (AN10), David L. Lawrence Convention Center,
Pittsburgh, Pennsylvania, USA, July 16, 2010.
- Daisuke Takahashi: Automatic Tuning for Parallel 3-D FFT with
2-D Decomposition, 2010 SIAM Conference on Parallel Processing for
Scientific Computing (PP10), Grand Hyatt Seattle, Seattle, Washington,
USA, February 25, 2010.
- Daisuke Takahashi: A Volumetric 3-D FFT on Clusters of
Multi-Core Processors, Third French-Japanese PAAP Workshop,
Shiran-Kaikan Hall Annex, Kyoto, Japan, April 21, 2009.
- Daisuke Takahashi: A Volumetric 3-D FFT on Clusters of
Multi-Core Processors, 2009 SIAM Conference on Computational Science
and Engineering (CSE09), Miami Hilton Downtown, Miami, Florida, USA,
March 5, 2009.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs, Second
French-Japanese PAAP Workshop, ENSEEIHT-IRIT, Toulouse, France, June
24, 2008.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs, 13th
SIAM Conference on Parallel Processing for Scientific Computing
(PP08), The Renaissance Atlanta Hotel Downtown, Atlanta, Georgia, USA,
March 12, 2008.
- Daisuke Takahashi: The FFTE Library and the HPC Challenge
(HPCC) Benchmark Suite, First French-Japanese PAAP Workshop,
Next-Generation Supercomputer R&D Center, RIKEN, Chiyoda-ku,
Tokyo, Japan, November 2, 2007.
5. Invited Talk
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on
Cluster of Intel Xeon Phi Processors, Parallel Fast Fourier Transforms
(PFFT) (held in conjunction with IEEE HiPC 2018), Radisson Blu
Bengaluru Outer Ring Road, Bengaluru, India, December 17, 2018.
- Daisuke Takahashi: Sparse Matrix-Vector Multiplication on
GPUs, International Workshop on Eigenvalue Problems: Algorithms;
Software and Applications, in Petascale Computing (EPASA2015),
Tsukuba International Congress Center, Tsukuba, Japan, September 14,
2015.
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs on
Clusters of Multi-Core Processors, Special Session: Auto-Tuning for
Multicore and GPU (ATMG) (held in conjunction with IEEE MCSoC-12), The
University of Aizu, Aizu, Japan, September 22, 2012.
- Daisuke Takahashi: Parallel Implementation of
Multiple-Precision Arithmetic and 2.576 Trillion Digits of Pi
Calculation on a Massively Parallel Cluster of Multi-Core Processors,
Workshop on Ultra Performance and Dependable Acceleration Systems
(held in conjunction with PDCAT'09), Gakushi-kaikan, Hiroshima
University, Higashi-Hiroshima, Japan, December 11, 2009.
6. Book
- Daisuke Takahashi: Fast Fourier Transform Algorithms for
Parallel Computers, Springer (2019).
7. Chapter in Book
- Daisuke Takahashi: Fast Fourier Transform in Large-Scale
Systems, Masaaki Geshi (Ed.): The Art of High Performance Computing for
Computational Science, Vol. 1, Springer, pp. 137-168 (2019).
- Taisuke Boku, Osamu Tatebe, Daisuke Takahashi, Kazuhiro
Yabana, Yuta Hirokawa, Masayuki Umemura, Toshihiro Hanawa, Kengo
Nakajima, Hiroshi Nakamura, Tsuyoshi Ichimura, Kohei Fujita, Yutaka
Ishikawa, Mitsuhisa Sato, Balazs Gerofi, and Masamichi Takagi:
Oakforest-PACS: Advanced KNL Cluster System, Jeffrey S. Vetter (Ed.):
Contemporary High Performance Computing: From Petascale toward
Exascale, Vol. 3, CRC Press, pp. 401-421 (2019).
- Hiroyuki Takizawa, Reiji Suda, Daisuke Takahashi, and Ryusuke
Egawa: Xevolver: A User-Defined Code Transformation Approach to
Streamlining Legacy Code Migration, Mitsuhisa Sato (Ed.): Advanced
Software Technologies for Post-Peta Scale Computing, Springer,
pp. 163-181 (2019).
- Daisuke Takahashi: Automatic Tuning for Parallel FFTs,
Ken Naono, Keita Teranishi, John Cavazos, and Reiji Suda (Eds.):
Software Automatic Tuning: From Concepts to State-of-the-Art Results,
Springer, pp. 49-67 (2010).
- Daisuke Takahashi: Implementation of Multiple-Precision
Parallel Division and Square Root on Distributed-Memory Parallel
Computers, Yi Pan and Laurence T. Yang (Eds.): Parallel and
Distributed Scientific and Engineering Computing: Practice and
Experience, Nova Science Publishers, pp. 35-49 (2004).