Machine Learning Data Engineer

Departments: Engineering

Location: Remote

About Us

We're on a mission is to revolutionize the CAD industry by developing the world's most advanced hardware design infrastructure and tools. Mechanical CAD is in the dark ages, and hardware demands have outpaced today’s hardware design infrastructure. The industry is due for a refresh, and we're laying the foundation for a modern hardware design toolkit so that you can create new design tools never before possible. We were founded and incubated by Embedded Ventures in 2021, and we're a fully remote team.

About You

We’re looking for curious, innovative, and ambitious self-starters to join our lean and growing team to help us bring our mission to life. We think you’ll thrive on our team if you’re:

  • Passionate about making an impact on the ground floor of something big!
  • Curious at your core, with an eagerness to learn and do things differently
  • Customer focused, always thinking about ways to improve the user experience
  • Able to operate autonomously while also being an effective team player 
  • Agile and thrive in a fast-paced, startup environment 

About The Role

We are seeking a highly skilled Machine Learning Data Engineer to join our growing team to support our foundational data engineering processes. In this role, you’ll build, implement, and manage our ML data ecosystem to support Zoo’s ML initiatives as we scale. The right candidate will have a strong background in Python programming, data integration, data warehousing, data munging/cleaning, and ETL for ML datasets. The ideal candidate will also have an understanding of Computer-Aided Design (CAD) with knowledge of the relevant data required for hardware design.

What You’ll Do

  • Develop and maintain automated processes for data collection from various sources
  • Design and implement data warehousing solutions to store and manage large datasets
  • Clean and pre-process data to ensure its quality and usability for ML models
  • Create and manage ETL pipelines to transform raw data into structured datasets for ML applications
  • Utilize vector databases to enhance RAG processes for advanced data retrieval
  • Apply CAD knowledge to integrate and manipulate design data within our systems
  • Collaborate with cross-functional teams to understand data requirements and deliver robust data solutions
  • Continuously optimize and improve Zoo’s data workflows and processes

What You’ll Need

  • B.S. Computer Science or a related field, or equivalent professional experience
  • Demonstrated proficiency in Data Engineering for ML datasets
  • Expertise in data extraction, data warehousing, and data munging best practices 
  • Proficiency with Python
  • Proficiency with SQL
  • Experience with Git
  • Familiarity with vector databases and their application in RAG
  • CAD knowledge and its integration with data systems

Nice to Have

  • Experience with PyTorch and training machine learning models
  • Strong background in data analytics, statistics, and data visualization techniques
  • Experience working on Generative AI applications
  • Proficiency in Rust


What We Offer 

  • Competitive compensation & equity packages 
  • Medical, Dental, and Vision coverage for you and your dependents 
  • 401K match (for US-based employees)
  • Flexible vacation policy
  • Home office stipend & wifi reimbursement to set you up for success working remotely
  • Pet insurance reimbursement for your animal friends

Zoo is proud to be an equal opportunity employer. We’re committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.