Name of the Post |
Project Engineer ( 4-8 years' experience ) |
Specialization/ Domain |
Application Support |
No. of Requirement |
8 |
Location |
Pune |
Qualification |
First Class B. E. / B. Tech. in Comp/IT/ Electronics/ Electronics & Telecommunication/communication/Electrical / Electrical & Electronics
OR
First Class MCA
OR
ME / M. Tech. in Comp/IT/ Electronics/ Electronics & Telecommunication/communication/Electrical / Electrical & Electronics
OR
Firs Class M. Sc. in Computer /IT
|
Post Qualification relevant Experience. |
For BE/B. Tech/MCA - 4 years post qualification relevant experience
For ME/ M. Tech - 1 years post qualification relevant experience
For M. Sc - 5 years post qualification relevant experience
|
Age |
37 years as on last date of application |
Skill Sets |
- Hands on experience in operating large scale compute infrastructure.
- Working knowledge of cluster configuration managements tools
- Experience with HPC cluster job schedulers such as SLURM, LSF
- Understating of container technologies like Docker, Singularity, Shifter etc.
- Proficient in bash scripting and working experience in python programming would be desirable
- Proficient in Linux Operating System
- Strong understanding of Linux administration
- Working knowledge of workflows that use MPI
- Experience with InfiniBand based networking
- Understanding of fast, distributed PFS based storage systems like Lustre and Spectrum Scale for HPC workloads.
- Understanding of HDFS, Spark and Kubernetes
- Understanding of HPC cluster and system networking
- Understanding and working knowledge of NFS, DHCP, DNS, SSH/SCP, boot over network, Ganglia, Nagios,
- Understanding of GPU accelerators,
- Understanding and working knowledge of system and network security
- Troubleshooting and problem solving skill
- Hands on experience with Linux operating system
- Experience with parallel programming models - MPI, OpenMP, pthreads; Experience with GPGPU computing - OpenACC & CUDA programming,
- Experience with AI frameworks: tensorflow, pytorch., Experience with HPC & AI applications compilation, installation, configuration, tuning & optimization on Linux based clusters.
- Knowledge of Programming language : C/C++, python Knowledge of containers will be of added advantage.
- Understanding of code review, compilers, debugging tools including Intel Parallel Studio, GCC, GDB, TotalView
- Excellent communication skill (Verbal and Written)
|
Job Profile |
- Monitoring, management and optimization of the facility including hardware and software
- Enabling and management of application workflows using docker containers,
- Development of plugins for integration with RT and monitoring tools,
- Automation of system administration tasks,
- Provide user support for technical issues, data management, etc.,
- System administration of dense GPU HPC-AI system, storage, network and associated infrastructure
- Operational/Schedule maintenance of servers and system.
- Troubleshooting of Hardware related issues
- Installed software trouble shooting, patch updates, Customer application installation,
- Regular node health check including analysis of performance, temperature monitoring.,
- Infiniband, Ethernet troubleshooting including Cables, Controllers, Drivers, IP address clashes, reassignment etc.,
- Storage maintenance and backup policies.,
- Documentation of the GPU-HPC environment as well as documenting system administration policies and procedures (Weekly Report Generation).
- Asset management ,
- Vendor co-ordination
- Manage, deploy and support HPC and AI application/frameworks on GPU based distributed computing clusters.
- Work with users to customize applications and configure software development, integration and production environments to specification
- Tune applications to optimize performance and reliability of services across the High- Performance Computing (HPC) ecosystem, Diagonse application problems quickly and effectively Automate administration procedures for routine and complex tasks
- Provide backup HPC system administration support
- Troubleshooting application execution through SLURM, K85 managed clusters
- Develop and maintain programs and scripts to aid in the operation and automation of administrative tasks and workflows using Bash and Python
|
CTC per Annum |
*As per the industry standards based on qualification, experience, expertise, role etc. |
|
Apply Now |