My Resume.
Interests:
Scalable Distributed Graph Computations
- Very fast, distributed-memory partitioning of power-law graphs: (“GraSP: Distributed Streaming Graph Partitioning,” 2015). (GitHub pending).
Scalable Tensor Computations
- In-place optimization of Dense Tensor-Matrix Multiply (InTensLi): (“An input-adaptive and in-place approach to dense tensor-times-matrix multiply,” 2015).
- Upcoming survey: "Tensors in Data Analysis: Methods, Applications, and Software," 2016
- Sketching techniques applied to tensors: (“A Practical Randomized CP Tensor Decomposition," 2017)
Other Publications:
- Balance Principles at HotPar: (“Balance Principles for Algorithm-architecture Co-design,” 2011).
- FFT paper: (“On the Communication Complexity of 3D FFTs and Its Implications for Exascale,” 2012).
- GPUOctave (Poster): (“GPUOctave - Enabling GPU Computing for GNU Octave (Poster),” 2009).
Talks Given:
- Balance Principles, May 2011, Intel, Santa Clara, CA.
- Balance Principles, HotPar 2011, Berkeley, CA, May 2011.
- Hierarchical Locales for Chapel, Supercomputing 2012 Chapel BoF, Salt Lake City, UT, November 2012.
- In-place optimization of Dense Tensor-Matrix Multiply (InTensLi), SIAM PP 2016, Paris, France. (invited)
- A Practical Randomized CP Tensor Decomposition, Sept. 1 2016, Sandia Natl Labs, Livermore, CA.
- A Practical Randomized CP Tensor Decomposition, ScalPerf 2016, Bertinoro, Italy. (invited)
- A Practical Randomized CP Tensor Decomposition, SIAM CSE 2017, Atlanta, GA. (invited)
References
- Battaglino, C., Ballard, G., & Kolda, T. (2017). A Practical Randomized CP Tensor Decomposition (in submission) [arXiv].
- Battaglino, C., & Mohindra, S. (2009). GPUOctave - Enabling GPU Computing for GNU Octave (Poster). In NVIDIA GPU Technology Conference.
- Battaglino, C., Pienta, R., & Vuduc, R. (2015). GraSP: Distributed Streaming Graph Partitioning. In Proceedings of the 1st ACM SIGKDD Workshop on High Performance Graph Mining (HPGM 2015), August 8th, 2015, Sydney, Australia.
- Li, J., Battaglino, C., Perros, I., Sun, J., & Vuduc, R. (2015). An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, SC 2015, November 15-20, 2015, Austin, Texas, USA. New York, NY, USA: ACM Press.
- Czechowski, K., Battaglino, C., McClanahan, C., Chandramowlishwaran, A., & Vuduc, R. (2011). Balance Principles for Algorithm-architecture Co-design. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism (pp. 9–9). Berkeley, CA, USA: USENIX Association. Retrieved from http://dl.acm.org/citation.cfm?id=2001252.2001261
- Czechowski, K., Battaglino, C., McClanahan, C., Iyer, K., Yeung, P.-K., & Vuduc, R. (2012). On the Communication Complexity of 3D FFTs and Its Implications for Exascale. In Proceedings of the 26th ACM International Conference on Supercomputing (pp. 205–214). New York, NY, USA: ACM. Retrieved from http://doi.acm.org/10.1145/2304576.2304604