Welcome to my GitHub profile! I am an IT Operations Engineer with 3 years of experience, specializing in automation, data center planning, container management, and SRE operations. My goal is to optimize operations, enhance system reliability, and drive industry advancements through cutting-edge technologies like cloud computing, containerization, and AI.
- Operating Systems: Linux (Red Hat, CentOS, Ubuntu)
- Certifications:
- RHCE (Red Hat Certified Engineer)
- CKA (Certified Kubernetes Administrator)
- Programming Languages:
- Python
- Bash
- DevOps Tools:
- Kubernetes
- Docker
- Ansible
- Prometheus, Grafana (Monitoring & Logging)
- Jenkins (CI/CD Automation)
- Cloud Platforms: AWS, GCP, Azure (Containerization & Cloud Ops)
- AI Technologies:
- Machine Learning (ML) Models & Applications
- AI Agents in Operations
-
Change Management
- Responsible for defining and executing change management processes, ensuring effective control of system, application, and network changes, and optimizing change execution efficiency and system stability.
-
Data Center Planning & Management
- Involved in the planning and construction of data centers, performing capacity planning and resource optimization, ensuring high availability and performance of data center infrastructure.
-
Container Operations (Kubernetes/Docker)
- Managed containerized environments using Docker and Kubernetes, optimizing cluster performance, supporting continuous delivery, and automating deployment processes.
-
SRE Operations & Automation Monitoring
- Designed and implemented automated monitoring, alerting, and failover mechanisms to ensure system stability and improve response times.
- Professional Certifications: RHCE and CKA certifications with a strong foundation in Linux operations and container technologies.
- Programming & Automation Skills: Proficient in Python and Bash, developing automation scripts and tools to increase efficiency.
- AI & Machine Learning Integration: Strong understanding of AI/ML technologies (e.g., GPT, BERT) for log analysis, anomaly detection, and performance optimization.
- AI Agents in Operations: Experience designing and implementing AI agent-based solutions for automated operations, improving system fault recovery and operational efficiency.
- Independent Problem Solving: Adept at diagnosing and resolving complex issues quickly, ensuring system uptime and reliability.


