|Name of the Post
||Project Engineer ( 4-8 years' experience )
|No. of Requirement
First Class B. E. / B. Tech. in Comp/IT/ Electronics/ Electronics & Telecommunication/communication/Electrical / Electrical & Electronics
First Class MCA
ME / M. Tech. in Comp/IT/ Electronics/ Electronics & Telecommunication/communication/Electrical / Electrical & Electronics
Firs Class M. Sc. in Computer /IT
|Post Qualification relevant Experience.
For BE/B. Tech/MCA - 4 years post qualification relevant experience
For ME/ M. Tech - 1 years post qualification relevant experience
For M. Sc - 5 years post qualification relevant experience
||37 years as on last date of application
- Hands on experience in operating large scale compute infrastructure.
- Working knowledge of cluster configuration managements tools
- Experience with HPC cluster job schedulers such as SLURM, LSF
- Understating of container technologies like Docker, Singularity, Shifter etc.
- Proficient in bash scripting and working experience in python programming would be desirable
- Proficient in Linux Operating System
- Strong understanding of Linux administration
- Working knowledge of workflows that use MPI
- Experience with InfiniBand based networking
- Understanding of fast, distributed PFS based storage systems like Lustre and Spectrum Scale for HPC workloads.
- Understanding of HDFS, Spark and Kubernetes
- Understanding of HPC cluster and system networking
- Understanding and working knowledge of NFS, DHCP, DNS, SSH/SCP, boot over network, Ganglia, Nagios,
- Understanding of GPU accelerators,
- Understanding and working knowledge of system and network security
- Troubleshooting and problem solving skill
- Hands on experience with Linux operating system
- Experience with parallel programming models - MPI, OpenMP, pthreads; Experience with GPGPU computing - OpenACC & CUDA programming,
- Experience with AI frameworks: tensorflow, pytorch., Experience with HPC & AI applications compilation, installation, configuration, tuning & optimization on Linux based clusters.
- Knowledge of Programming language : C/C++, python Knowledge of containers will be of added advantage.
- Understanding of code review, compilers, debugging tools including Intel Parallel Studio, GCC, GDB, TotalView
- Excellent communication skill (Verbal and Written)
- Monitoring, management and optimization of the facility including hardware and software
- Enabling and management of application workflows using docker containers,
- Development of plugins for integration with RT and monitoring tools,
- Automation of system administration tasks,
- Provide user support for technical issues, data management, etc.,
- System administration of dense GPU HPC-AI system, storage, network and associated infrastructure
- Operational/Schedule maintenance of servers and system.
- Troubleshooting of Hardware related issues
- Installed software trouble shooting, patch updates, Customer application installation,
- Regular node health check including analysis of performance, temperature monitoring.,
- Infiniband, Ethernet troubleshooting including Cables, Controllers, Drivers, IP address clashes, reassignment etc.,
- Storage maintenance and backup policies.,
- Documentation of the GPU-HPC environment as well as documenting system administration policies and procedures (Weekly Report Generation).
- Asset management ,
- Vendor co-ordination
- Manage, deploy and support HPC and AI application/frameworks on GPU based distributed computing clusters.
- Work with users to customize applications and configure software development, integration and production environments to specification
- Tune applications to optimize performance and reliability of services across the High- Performance Computing (HPC) ecosystem, Diagonse application problems quickly and effectively Automate administration procedures for routine and complex tasks
- Provide backup HPC system administration support
- Troubleshooting application execution through SLURM, K85 managed clusters
- Develop and maintain programs and scripts to aid in the operation and automation of administrative tasks and workflows using Bash and Python
|CTC per Annum
||*As per the industry standards based on qualification, experience, expertise, role etc.