About
To research and develop next-generation data center management methodologies to make the data center more efficient in terms of performance and energy consumption, minimize human intervention, predictable and responsive to evolving computing needs.
Research Assistant
Thank you for visiting my website. Currently, I am working as a Research Assistant and Ph.D. candidate in the Department of Computer Science, at Texas Tech University. I have six years of Ph.D. research and development experience in myriads of areas of data centers. I am a subject matter expert role in power and performance in our research team; developed and executed voltage and frequency curve characterization plans for real HPC, ML, and benchmarks; analyzed and characterized variations in power and performance across dynamic voltage and frequency scaling; determine worst and best performance and thermal spots for voltage and frequency; identified core architecture bottlenecks to test and mitigate worst case scenarios; developed prototyping experiments for implementing features that impact GPU/CPU power & performance; characterized critical key performance indicators (KPI) to create product definition and impact next generation designs; debugged and troubleshooted system-level issues that may occur in test and customer platforms; excellent grasp of computer organization and architecture for the state-of-the-art architectures, including AMD EPYC, AMD Instinct MI100, MI210, NVIDIA Volta and Ampere; strong programming skills, experience in Python, C/C++, and Golang; proficient in Linux command line environment and Shell scripting; extensive experience in power management techniques including power capping and DVFS; strong analytical and problem-solving skills with a key attention to detail; excellent presentation and communication skills; and I a m self-starter, effective communicator and able to independently drive tasks to completion.
Research Topic
My dissertation topic is “Deterministic control and automation of high-end computing systems”. I have research and publication experience in the characterization and modeling behaviors of HPC workloads in terms of performance, power, energy consumption, and thermal behaviors on state-of-the-art CPU and GPU architectures. The overall approach is to develop model-driven controls using ML and analytical approaches to manage these behaviors in an optimal way. My research deliverables are developed in collaboration with with different organizations including the National Energy Research Scientific Computing Center (NERSC) of Lawrence Berkeley National Laboratory (LBNL), Ultra-Scale Research Center (USRC) of Los Alamos National Lab (LANL), Dell Technologies, Distributed Management Task Force (DMTF)’s Redfish Forum, and TTU’s high performance computing center (HPCC).
Technical Skills
- Languages/Frameworks: Python, Go-lang, C/C++, Bash, MPI, OpenMP, CUDA, HIP, ROCm
- LAMMPS, NAMD, GROMACS, LSTM, SPEC ACCEL®, STREAM, DGEMM, FIRESTARTER
- GPU Performance Tools: AMD uProf, ROCm Profile (rocprof), ROCm Data Center (rdc), ROCm SMI, NVIDIA SMI, Data Center GPU Manager Interface (DCGMI), NVIDIA® Nsight™, nvprof, perf, LIKWID, Intel RAPL, PAPI, Redfish, IPMI
- Performance Tuning: with the Roofline Model on GPUs and CPUs
- System architectures: AMD EPYC 7763, AMD MI100, MI210, Intel Xeon, NVIDIA Ampere, Volta, and Pascal GPUs
- etrics Analysis: Analysis of metrics using correlation techniques (Pearson,Spearman, Mutual Information)
- Model Development: Modeled performance and power consumption behaviors across CPU/GPU’s DVFS design space to predict power consumption and performance for new applications and computing architectures
- Energy-Performance Trade-offs: optimal performance, power, and energy profile selection using energy-delay product
- HPC Cluster Monitoring: Monitoring of TTU’s HPCC clusters using in-band and out-of-band protocols, Telegraf, Nagios
- AI/ML: Random Forest, XGBoost, SVM, DNN, cuDNN frameworks (TensorFlow, PyTorch, Keras)
- HPC Workload Manager: Setting up Slurm cluster and executing HPC workloads
- Databases: InfluxDB, TimescaleDB, MySQL
- Scientific Writing: developing technical write-ups (architectural designs, whitepapers, technical papers)
- Collaborations/Presentations: conducting presentations and managing collaborations with technical partners.
Achievements
Publications
- Ghazanfar Ali, Sridutt Bhalachandra, Nicholas J. Wright, Mert Side, and Yong Chen. "Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems." In 2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1-7. IEEE, 2022.
- Ghazanfar Ali, Lowell Wofford, Christopher Turner, and Yong Chen. "Automating CPU Dynamic Thermal Control for High Performance Computing." In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 514-523. IEEE, 2022.
- Li, Jie, Ghazanfar Ali, Ngan Nguyen, Jon Hass, Alan Sill, Tommy Dang, and Yong Chen. "Monster: an out-of-the-box monitoring tool for high performance computing systems." In 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 119-129. IEEE, 2020.
- Ghazanfar Ali, Jon Hass, Alan Sill, Elham Hojati, Tommy Dang, and Yong Chen. "Redfish-Nagios: A Scalable Out-of-Band Data Center Monitoring Framework Based on Redfish Telemetry Model." In Fifth International Workshop on Systems and Network Telemetry and Analytics, pp. 3-11. 2022.
Entire list of publications and patents are available at: https://scholar.google.com/citations?user=qDH-G2UAAAAJ&hl=en
Services
- During my Ph.D. work, I served my department from the following perspectives. I was part of a departmental campaign for recruiting prospective undergraduate students for the years 2020 and 2021 during COVID-19. This involved calling the students to introduce our program and department and clarify their questions. This increased our enrollment by 30%.
- I also served as a panelist for the Computer Science Junior Career Advising Panel in 2021. This was intended to help students make more informed decisions regarding their career paths.
Contact
Address:
Department of Computer Science Texas Tech University, Box 43104 Lubbock, TX 79409-3104
E-mail:
ghazanfar.ali@ttu.edu
Call:
+1 806 724 5332