Railnova is the leading provider of asset connectivity, data analysis, and workflow engine for Railway organisations. Our mission is to enable railway organisations to make substantial improvements in their performance via digital technologies, in order to sustain and promote green transport.
Railnova is hiring an experienced DevOps / Cloud Engineer to strengthen our AWS cloud team.
This position is all about streaming systems, security, tooling, deep-dive debugging, web performance, internal support, monitoring, and documentation. You'll fit in our software team of 10 programmers and work hand-in-hand with our infrastructure expert and our lead architect on our IoT stack, which comprises the following:
a device management part we call "Railster" (firmware building, OTA update server, PKI server, MQTT server, data flow, and SIM card monitoring)
a real-time streaming part we call "Railgenius" (Kafka, Python and ML jobs, Terabyte scale append-only data sink on AWS, Django/React app)
a workflow engine we call "Railfleet" (Django/React app, REST API, AWS EKS "glue" services to interface with Customers legacy system)
Your primary focus will be to build reliability and robustness into our systems. You'll establish reliable deployment pipelines, which make it productive for programmers to develop apps. You'll investigate performance or security issues, write code, refactor systems, analyse data to ensure our cloud services are highly available, monitored, cost explicit, automated, documented, and secure. You'll analyse risks and anticipate failure modes, and design shippable roadmaps to prevent them.
Working at Railnova means working in a software product company that builds a solution for the long term. We emphasise user needs, resilient architecture, long-term vision, and quality of service. You can expect an eager, multi-disciplinary team, ranging from hardware design to UX, who will support you, rise to challenges with you, and grow together.
Here are some real examples of the work we've done lately that might help you get a better idea of what this job entails:
migrate various part of our single server stack to the cloud (EKS, EC2)
change from Webpack to ESBuild to gain 40 seconds on deploy times for the dev team in the continuous integration system
deploy prometheus+thanos in EKS
analyse and optimise database index hit misuses and stale queries
activate our disaster recovery plan during a data-centre fire which led to catastrophic loss of some of our MQTT Servers (we recovered it all)
make a preliminary assessment regarding ISO27001 certification.
update the Helm charts for our Grafana deployment
update Python Django libraries that were subject to CVEs with the dev team
handle various vendor support tickets with AWS third-party service providers (Snowflake, Aiven, CloudAMQP,...)
You have solid fundamentals in software development, systems, troubleshooting, and teamwork.
Programming Python, SQL, a solid knowledge of network, security, and infrastructure automation are required for this position.
You are experienced in cloud deployment (AWS, GCP, or Azure), site reliability, DevOps, disaster recovery, systems engineering. You know what IAC is, K8S, terraform, helm charts, RabbitMQ, Kafka, Postgres. The Terminal is second nature to you :-).
You are autonomous, you feel comfortable making decisions and explaining them to the team.
You are an excellent written communicator. You write documentation for you and others who dive in after you.
You are a team player who values trust and puts the team objective above your own. In case of feedback, you can quickly adjust and build on others’ ideas.
You'll be given space and time for deep focus on your work and be exposed to a technical and caring team, and be allowed to perfect your software engineering skills. On top of that, you'll get:
a balanced work environment with a choice of being either fully remote (in Europe), or partial remote, or full time in our offices near Brussels South Train Station (when sanitary conditions allow for it). Railnova has a remote culture with a few full-time employees remote since day one.
32 days of paid holidays,
space to grow through a deep focus on your work, one conference per year of your choice, extra courses, and self-learning
a young, multi-disciplinary and dynamic team in a medium-sized scale-up (~35 employees), with a rock-solid, subscription-based business model in IoT and Data Analytics.
an extensive collection of perks including a smartphone, laptop, and screens of your choice, extra healthcare insurance, transport card, company car, bicycle plan, raspberry pis, and meal vouchers (Belgium)
an open culture where we nurture creativity while keeping our clients and the rest of the team in mind at all times.
The application process is handled as follows:
submit your written application via the “Apply for this job” button
The hiring manager and the HR team evaluate your application based on written communication skills, critical thinking, and experience. You will have an answer from us within 15 days
interview with the Hiring manager
case study exercise
final interviews and team presentation
You can ask questions at any time during the application process simply by responding to the confirmation email you’ll receive after submission.