Interests:

Scalable Distributed Graph Computations

graph

Very fast, distributed-memory partitioning of power-law graphs: (“GraSP: Distributed Streaming Graph Partitioning,” 2015). (GitHub pending).

tensor

Upcoming survey: "Tensors in Data Analysis: Methods, Applications, and Software," 2016

Balance Principles, May 2011, Intel, Santa Clara, CA.
Balance Principles, HotPar 2011, Berkeley, CA, May 2011.
Hierarchical Locales for Chapel, Supercomputing 2012 Chapel BoF, Salt Lake City, UT, November 2012.
In-place optimization of Dense Tensor-Matrix Multiply (InTensLi), SIAM PP 2016, Paris, France. (invited)
A Practical Randomized CP Tensor Decomposition, Sept. 1 2016, Sandia Natl Labs, Livermore, CA.
A Practical Randomized CP Tensor Decomposition, ScalPerf 2016, Bertinoro, Italy. (invited)
A Practical Randomized CP Tensor Decomposition, SIAM CSE 2017, Atlanta, GA. (invited)

Battaglino, C., Ballard, G., & Kolda, T. (2017). A Practical Randomized CP Tensor Decomposition (in submission) [arXiv].
Battaglino, C., & Mohindra, S. (2009). GPUOctave - Enabling GPU Computing for GNU Octave (Poster). In NVIDIA GPU Technology Conference.
Battaglino, C., Pienta, R., & Vuduc, R. (2015). GraSP: Distributed Streaming Graph Partitioning. In Proceedings of the 1st ACM SIGKDD Workshop on High Performance Graph Mining (HPGM 2015), August 8th, 2015, Sydney, Australia.
Li, J., Battaglino, C., Perros, I., Sun, J., & Vuduc, R. (2015). An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, SC 2015, November 15-20, 2015, Austin, Texas, USA. New York, NY, USA: ACM Press.
Czechowski, K., Battaglino, C., McClanahan, C., Chandramowlishwaran, A., & Vuduc, R. (2011). Balance Principles for Algorithm-architecture Co-design. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism (pp. 9–9). Berkeley, CA, USA: USENIX Association. Retrieved from http://dl.acm.org/citation.cfm?id=2001252.2001261
Czechowski, K., Battaglino, C., McClanahan, C., Iyer, K., Yeung, P.-K., & Vuduc, R. (2012). On the Communication Complexity of 3D FFTs and Its Implications for Exascale. In Proceedings of the 26th ACM International Conference on Supercomputing (pp. 205–214). New York, NY, USA: ACM. Retrieved from http://doi.acm.org/10.1145/2304576.2304604