DevOps / Site Reliability Engineer (w/m/x)
Brussels, Brussels Hoofdstedelijk Gewest, BelgiumCloud Software
Railnova is the leading provider of asset connectivity, data analysis, and workflow engine for Railway organisations. Our mission is to enable railway organisations to make substantial improvements in their performance via digital technologies, in order to sustain and promote green transport.
Railnova is hiring an experienced DevOps / Site Reliability Engineer to strengthen our Infrastructure team in the long term.
This position is all about operating, maintaining, monitoring and troubleshooting our services, applications including our streaming systems, IoT devices, and internal tools, with a focus on security, reliability, documentation and compliance. You'll fit in our software team of developers and work hand-in-hand with our infrastructure expert in charge of our IoT stack, which comprises the following:
a device management part we call "Railster" (firmware building, OTA update server, PKI server, MQTT broker, data flow, and SIM card monitoring)
a real-time streaming part we call "Railgenius" (Kafka, Python and ML jobs, Terabyte scale append-only data sink on AWS, Django/React app)
a workflow engine we call "Railfleet" (Django/React app, REST API, AWS EKS "glue" services to interface with Customers legacy system)
Your primary focus will be to maintain reliability and robustness into our systems. You'll unify and maintain reliable deployment pipelines, which make it productive for programmers to develop apps. You'll investigate performance or security issues, refactor systems, ensure our cloud services are highly available, monitored, cost explicit, automated, documented, and secure.
Working at Railnova means working in a software product company that builds a solution for the long term. We emphasise user needs, resilient architecture, long-term vision, and quality of service. You can expect an eager, multi-disciplinary team, ranging from hardware design to UX, who will support you, rise to challenges with you, and grow together.
Here are some real examples of the work we've done lately that might help you get a better idea of what this job entails:
partition the database to speed up SQL queries
migrate various part of our single server stack to the cloud (EKS, EC2)
change from Webpack to ESBuild to gain 40 seconds on deploy times for the dev team in the continuous integration system
deploy prometheus+thanos in EKS
analyse and optimise database index hit misuses and stale queries
activate our disaster recovery plan during a data-centre fire which led to catastrophic loss of some of our MQTT Servers (we recovered it all)
make a preliminary assessment regarding ISO27001 certification.
update Python Django libraries that were subject to CVEs with the dev team
handle various vendor support tickets with AWS third-party service providers (Snowflake, Aiven, CloudAMQP,...)
You have solid fundamentals in software development, systems, troubleshooting, and teamwork.
Programming Python, SQL, a solid knowledge of network, security, and infrastructure are required for this position.
You are experienced in cloud deployment (AWS, GCP, or Azure), site reliability, DevOps, disaster recovery, systems engineering. You know what IAC is, K8S, terraform, helm, RabbitMQ, Kafka, Postgres. The Terminal is second nature to you :-).
You are autonomous, you feel comfortable making decisions and explaining them to the team.
You are an excellent written communicator. You write documentation for you and others who dive in after you.
You are a team player who values trust and puts the team objective above your own. In case of feedback, you can quickly adjust and build on others’ ideas.
What we offer
You'll be given space and time for deep focus on your work and be exposed to a technical and caring team, and be allowed to perfect your software engineering skills. On top of that, you'll get:
A balanced work environment hybrid remote/office time (usually 2 to 3 days remote), our offices are near to Brussels South Train Station. Railnova has a remote culture with a few full-time employees remote since day one,
32 days of paid holidays,
Space to grow through a deep focus on your work, one conference per year of your choice, extra courses, and self-learning,
A young, multi-disciplinary and dynamic team in a medium-sized scale-up (~35 employees), with a rock-solid, subscription-based business model in IoT and Data Analytics,
An extensive collection of perks including a smartphone, laptop, and screens of your choice, extra healthcare insurance, transport card, company car, bicycle plan, raspberry pis, and meal vouchers (Belgium),
An open culture where we nurture creativity while keeping our clients and the rest of the team in mind at all times.
The application process is handled as follows:
Submit your written application via the “Apply for this job” button
The hiring manager and the team evaluate your application based on written communication skills, critical thinking, and experience. You will have an answer from us within 15 days
Interview with the Hiring manager
Case study exercise
Final interviews and team presentation
You can ask questions at any time during the application process simply by responding to the confirmation email you’ll receive after submission.