Sr. Systems Engineer

San Francisco Peninsula , CA

Posted: 09/30/2019 Employment Type: Contract Industry: IT Job Number: j-861

Job Description

Sr. Systems Engineer

We’re looking for a seasoned systems person. We are building an AI platform with the assumption that our compute environments are getting more dense and more heterogeneous. If you are excited by the challenge of building a platform that trains and runs hundreds of AI services in a multi-cluster environment, this role is for you.

Our stack is built on Cloud Native components such as Kubernetes, Envoy, gRPC, and Jaeger. We take microservices to the heart and everything we do is scalable from day one. We run in the highest compute environments available in the market: currently, we run in a cluster of DGXs from NVIDIA. We are an AI-first company and we believe that “data writes software”. To that end, we instrument everything we can in our platform. We crunch this data so that it helps the Systems Engineer in you make a better system.

Our team is a mix of skill sets: folks with Systems background, folks with Deep Learning background, and folks who know a little of everything enterprise. We value being a top-notch organization with a strong engineering-driven culture, and have the same high standards with our code, systems, and people. We value learning and growth (and not being bored) and hire diverse, well-rounded, communicative people we can envision being friends with and trusting.

More About the Job

  • Provide an on-demand addressing and routing capability for large data messages across a multi-node cluster of dense compute environment. We make an assumption that the bandwidth between nodes within a cluster is increasing.

  • Extend the above to include cluster of clusters across public internet backbone.

  • Identify and use external services whenever they make sense; we want to hack existing tools for our purposes rather than build everything in-house!

You Have

  • Led a team to implement a system that you are proud of for its sheer scale in compute.
  • Experience with routing in high-bandwidth environments (e.g., NVLINK)
  • Experimented with networking shenanigans that could be very powerful if done correctly (e.g., code injection through proxies).
  • Some DevOps experience to be sufficiently independent for prototyping
  • Background and experience in systems engineering, for example a Computer Science degree (or comparable) and several years’ working experience in systems.

Bonus Points

  • You have used NVLink and/or NVSwtich.
  • You have worked on Machine Learning projects before.
  • You have familiarity with the architecture of cloud providers.

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.