Senior Site Reliability Engineer (SRE)

Engineering · Sleman, Yogyakarta
Department Engineering
Employment Type Full-Time
Minimum Experience Experienced

Come join us! We are a fast growing and well-funded U.S. based startup in Seattle, Washington with teams in Vancouver, Canada and San Francisco and a growing office in Yogyakarta, Indonesia. Founded by industry veterans from EA, LucasArts, and EpicGames, who built the online backends for well-known titles and platforms such as EA Origin, FIFA, StarWars.com, UnrealEngine, Fortnite, Paragon and UnrealTournament. Our company develops technology for the major well known online game studios and publishers across the USA, Europe, and the world. 


Our Mission


We are game industry veterans with years of experience building and operating live systems. As a rapidly growing startup with teams around the world, we craft technology solutions to serve game studios who are making the next generation of games. 


Compensation & Benefits


We offer you with Competitive salaries and other benefits: Relocation package, Private Insurance (family included and it's Cashless!), Full coverage BPJS Kes & TK, Flexy benefit, Periodic MCU, Sport activities (badminton, yoga, futsal and you can propose a new activity!),  Social activity (company outings, Accelebrate, knowledge sharing session), Food to keep your engine up and running (snacks and coffee brewing), Opportunities for overseas travel, etc.


Our Culture


We believe that the best companies are ones where employees are empowered to make decisions, obsess about what’s best for users, and are not afraid to make mistakes and learn from them. Our engineering culture is based on humility, openness to feedback and collaboration which we feel results in the best performing engineering teams and the most beneficial to everyone's growth.


AccelByte is building a 24x7 operations team for AAA multiplayer video games. In this position, we need a driven Site Reliability Engineer who can actively participate in the day-to-day combat by maintaining high reliability of our service and drive prioritization in fixing what may be broken today as well as able to envision, design and implement processes and technologies to improve the ability to identify, isolate, correlate, and mitigate service impacting problems in the system. Service restoration, and making customers happy is not enough, you must know some coding to automate routine tasks in service metrics gathering, correlating, organizing, and presenting, in addition to detail and in-depth root cause analysis


Job Description:


  • Be proactive about production uptime, stability & resiliency
  • Provide observability into applications and infrastructure through metrics, logging, and monitoring to ensure platform uptime.
  • Work closely with LiveOps/L3 and DevOps teams, provide leadership to continually improve, upgrade platforms’ monitoring, alerting and audit system
  • Ensure ongoing availability, performance, security, and scalability of infrastructure components, provide leadership to improve platforms’ SLA
  • Utilize modern Infrastructure as Code (IaC) principles, identify opportunities for efficiencies by leveraging automation and process improvement
  • Assist in Root Cause Analysis and identify solutions to production events
  • Identify improvement opportunities in applications
  • Analyze the solutions, design the processes and implement the best practices for live production support


Requirements:


  • 7+ years Linux, 3+ years DevOps, 2+ years Kubernetes 
  • Degree in Computer Science or equivalent experience
  • Comfortable programming in languages such as Golang, Python, Bash etc.
  • Prior experience helping design, manage and run monitoring system for large scale applications in cloud
  • Extensive experience with cloud monitoring, logging, APM solutions and strategies
  • Hands on experience with ELK, Prometheus, Grafana, AWS CloudWatch 
  • Solid security best practices
  • Keen problem solving skills with the ability to work under pressure (during a production event)
  • Communicate in writing and verbally with clarity and precision
  • Love learning things and know the value of feedback and facing new challenges
  • Self-motivated team player, work independently


Nice to have:


  • Good understanding of SecDevOps, cloud, microservices, containers (ie., Azure, Kubernetes, Ansible, Terraform or equivalent).
  • Familiar with web services patterns/architectures, e.g. REST, SOAP, etc.
  • Knowledge with common infrastructure tools such as RabbitMQ, Redis, RDBMS, NoSQL, Kafka, Elasticsearch, etc.
  • Experience with other cloud monitoring tools: Netdata, Zabbix, Datalog, Dynatrace, New Relic, Zabbix, etc.
  • Experience working with auto-scaling workloads both in containers and on VMs
  • Experience with cloud technologies and infrastructure (AWS preferred, GCP, Azure)
  • Experience with Confluence, Jira and BitBucket
  • Experience working in the game industry
  • IT standards, methodologies, Cryptographic key management regulations, and audit experience would be an asset(s).


AccelByte Inc is an Equal Employment Opportunity Employer, all qualified candidates applicants will receive consideration for employment without regard to race, religion, gender, national origin, sexual orientation, marital status, age or disability. Our culture is of innovation, inclusive, and we value our people the highest.


Please visit our career page for a complete listing of our open positions:

https://accelbyte.io/careers/#jobs

Thank You

Your application was submitted successfully.

  • Location
    Sleman, Yogyakarta
  • Department
    Engineering
  • Employment Type
    Full-Time
  • Minimum Experience
    Experienced