AI Data Engineer

Remote
Seasonal
Experienced

AI Data Engineer 

 

Project Overview:

Camshaft is the project code name for an AI enablement platform designed by Kurt Theobald, CEO of Classy Llama, a digital consultancy founded in 2007 with a mission to love others through business and a vision to deliver inspiration, insights, and digital transformation services that provide rock-solid solutions for industrial B2B companies. These include industries like automotive, manufacturing, and customizable products. 

 

The purpose of the AI platform is to enable AI engagement for business and operational users as well as to streamline AI tuning and organizational distribution for more technical users.

 

Much like computing in the late 70s, LLMs are like unrefined power currently being tinkered with by technologists.  And just as Steve Jobs did with computing, molding the technology into the shape of a human through simple, elegant design backed by robust technology, so we seek to do the same with AI.

Position Overview:

The AI Data Engineer focuses on building, maintaining, and optimizing scalable data pipelines and data architectures that will power the Camshaft AI platform. A strong emphasis is placed on reliable code, data governance, and delivering high-quality, clean, and organized data to downstream systems and users.
 

We’re looking for engineers who want to work on a high-velocity team that is building world-changing technology in an intensely humble and direct creative culture; we don’t waste time on window-dressing; we just say what we see and expect everyone else on the team to do the same.  Why?  Because we want to make everything better as quickly as possible, and honest criticism is an essential ingredient. 

Will you get the opportunity to work with cutting-edge technology?  Yes.

Will you get to focus on collaboratively creating great stuff with like-minded and equally motivated people?  Yes.

Will it be very challenging?  Probably painfully so.

We’re in a race.  And we intend to win it.  But we can’t just be fast.  Plenty of crappy products will get made really fast.  We need to be empathetic in design, excellent in build, and fast in delivery.

The position is a short-term contract that could convert into a long-term contract when the platform MVP ships and gains initial market traction and leads to a new round of capital investment.

 

Responsibilities:

Data Infrastructure Development

  • Design, develop, and maintain scalable and efficient data pipelines using Python, PySpark, and other modern tools.
  • Implement data transformation, aggregation, deduplication, and cleaning processes to ensure data quality and consistency.
  • Work with relational databases (SQL preferred) and integrate multiple database types to enable efficient data workflows.

Data Governance & Security

  • Ensure compliance with data governance policies, including data provenance and access controls.
  • Monitor data pipelines for performance, reliability, and security, and address issues as they arise.

Data Monitoring & Troubleshooting

  • Build systems to monitor data changes, including rate fluctuations, and trigger appropriate error-handling mechanisms.
  • Troubleshoot data-related issues and work with team members to resolve them efficiently.

Collaboration

  • Work closely with data scientists, analysts, and other stakeholders to understand data needs and implement solutions.
  • Participate in discussions on data model optimization and evolving data systems to meet business requirements.
 

Required Skills, Abilities, and Characteristics:

Technical Expertise

  • Strong programming skills in Python, including parallel processing and software engineering principles.
  • Experience working with large codebases, package management, and Git for source control.
  • Familiarity with PySpark and distributed data processing.
  • Experience with at least one cloud-based platform such as AWS or Azure.
  • Proficiency in working with relational databases (SQL preferred).

Data Governance & Monitoring

  • Knowledge of data consistency, types, aggregation, and deduplication principles.
  • Experience implementing data monitoring and error-handling systems.

Collaboration & Communication

  • Proven ability to work collaboratively across teams to deliver effective data solutions.
  • Strong problem-solving and troubleshooting skills.

Nice to haves: 

  • Experience integrating AI into data pipelines.
  • Familiarity with Databricks or similar platforms.
  • Knowledge of working with diverse data types such as time series, text, PDFs, and video.
  • Experience working with REST APIs for data integration.
  • Knowledge of DB administration for data and ML models.

Organizational Alignment:

The Data Engineer reports to the VP of Technology (Jonathan Hodges) and collaborates closely with other Data Engineers, Data Scientists, Data Analysts, Solutions Architects, UX Designers, the Camshaft Product Owner (Kurt Theobald), and other stakeholders.  

 

During the initial build phase, this will be an elite, small team to maximize velocity and quality at a foundational level while we build through to a strong beta MVP that is actively being used in the market.  

 

We are targeting a delivery date for the MVP of June 18th, 2025.


 
Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*