Senior Site Reliability Engineer
tinybird
What are we looking for?
We are looking for someone to help us scale and to keep our software and infrastructure reliable and elastic as we scale. Someone who knows how to make hardware and software play together.
We run our stack in Linux. We try to keep things simple. Technologies we use:
- Nginx: SSL termination and load balancing.
- Varnish: load balancing and, sometimes, caching.
- Redis: metadata store.
- Python: most of our backend uses Python except some small bits that rely on C++ for hot paths.
- ClickHouse: our main data store.
- Zookeeper: for ClickHouse replicas coordination.
- We use Grafana, collectd and statsd for monitoring and alerting.
We have been relying on Ansible to automate the provisioning and deployment of all those technologies in various configurations, both in multi-tenant and dedicated setups. Our number of machines is still manageable, but the number keeps growing as we keep adding customers.
This is not about managing infrastructure but about making sure that our software uses the hardware resources wisely and flexibly. This means you will not only have to worry about automating machines, but about helping the product team to design and develop the architecture of the system as a whole. That will require you to working with our backend code and to understand how ClickHouse works.
Some challenges and things we want to improve:
- Observability: from specific resource usage to a bird's eye view of the whole platform. This requires good knowledge of storage, networking, and computing.
- High-availability and elasticity: as we keep adding customers, we need to architect our system to be more efficient and flexible.
- Disaster recovery: improving our tooling to manage and discover problems, but also improving our on-call procedures.
As a specific challenge: when our customers grow and we need to upgrade their accounts. Now, we do it manually—not in the traditional sense of manual because we have tools that automate much of the process, but we need to take care of that one customer at a time: deciding what machines we need to spin-up, how much storage we will provision, etc. Ideally, our architecture should our customers to upgrade themselves and assign more resources to them dynamically and seamlessly the most dynamic, safe and transparent way possible.
What will we value?
- Experience designing, building and running distributed Cloud architectures and large scale web based applications. That is, in so many words, what you will be responsible for at Tinybird.
- Programming skills and willingness to dive into our codebase, Clickhouse’s or other in order to figure out how things work. In Tinybird we work mostly with Python and C++.
- Accountable and enthusiastic to take on the responsibility of designing and managing the platform, and an urge to take on things that may be broken. Unafraid to break stuff because you own it and can fix it if need be.
- Bias for action, iteration and delivery. Conscious that often decisions can be reversed quickly and that speed is of the essence in business and technology.
- That you think in terms of systems and you are attuned to edge cases, failure modes, behaviors, specific implementations.
- Comfortable collaborating and communicating asynchronously.
- Keen documenter of everything you learn and build, to figure out things once and to make it easy for everybody else.
- Experience with Nginx, Varnish, Redis, Ansible would be great for you to get up and running quickly, but we don’t bring you here to tell you what the right technologies are: rather we expect you to recommend the right one for each challenge.
- Experience with ClickHouse and/or rolling out database systems at scale would be a huge plus.
Some bits about the way we work
- We are a fully remote company, and not just because of COVID19, we have worked like that for many years. All of our previous companies were remote friendly companies.
- We will provide you with up to €2400 to get the right setup at home if you need it.
- We are just starting up so your work will impact everything we do. We also believe in full transparency and you will always know what is going on.
Here you have our company principles.
A bit more about the hiring process
- Selected candidates will be invited to schedule a screening call with our tech team.
- Next, after sending you some materials, you will be invited to schedule a second interview.
- Following successful interviews, you will be invited to schedule a final meeting with the rest of the founding team.
- Successful candidates will subsequently be made an offer via phone or video call.
Compensation
- Competitive package, including €63K to €92.5K salary.
- 22 days of holiday a year (plus your birthday and public holidays), but who is counting.
- Freedom to work from wherever suits you best. This time, we are looking for people based in the time zone range: UTC -2 to UTC +3.
How to apply
Apply telling us a bit about yourself and ask us whatever you need to know about the problem we are trying to solve, the company, your role, etc.