We are looking for the best
At 42dot, our AD ML Platform Engineers build the core data platform and ML training / eval platform for the cutting edge algorithms in autonomous driving. We develop the distributed system of a scalable data platform for large-scale dataset (millions of scenes), as well as high-performance data serving SDKs for ML model training / evaluation. The platforms we deliver could highly improve the efficiency of ML model development lifecycle, including training, evaluation, deployment, as well as monitoring in the cloud environment.
Responsibilities
- Develop a high scale, reliable data platform to manage, visualize, search and serve large-scale datasets for ML model training, fine tune and validation.
- Develop advanced autonomous driving data SDK, including scene data search, datasets preparation, dataset loading, etc.
- Build up the data lakehouse for autonomous driving scene dataset, including the sensor data, calibration data, as well as annotation data
- Dig into performance bottlenecks all along the data processing pipelines, from data processing latency, data search latency to Test Procedure (TP) coverage.
- Bootstrap and maintain infrastructure for data platform components—data processing pipeline, database, data lakehouse and data serving.
- Collaborate with cross-functional teams, including ML algorithm, ML application, and Cloud Infra to align ML Platforms with overall autonomous driving system architecture.
Qualifications
- Bachelor's degree or higher in Computer Science, Engineering, Robotics, or a similar technical field.
- Minimum of 5 years of experience in Data Engineering or ML Platform roles
- Proficient in Python and solid experience in Python SDK development
- Solid working experience in Databases (e.g., MongoDB, PostgreSQL, etc)
- Hands-on experience with data pipeline job orchestration with Databricks Workflows or Apache Airflow, as well as integrating data pipelines with machine learning models
- Extensive experience with data technologies and architectures such as Data Warehouse (e.g., Hive) or Lakehouse (e.g., Delta Lake)
- Experience with Apache Spark or other big data computing engines
Preferred Qualifications
- Experience with autonomous vehicle sensor data (e.g., LiDAR, camera, radar)
- Experience with ML model training lifecycle (e.g., data preparation, model training / validation / deployment, etc)
- Understanding of modern AI frameworks (e.g., PyTorch, TensorFlow etc.)
- Understanding data governance principles, data privacy regulations, and experience implementing security measures to protect data
※ Please review the following information before applying.
How to work in 42dot, About 42dot Way →