Description:
We are looking for a graph data engineer. In this role, you will get to work alongside various experts in product and engineering.
Example Responsibilities:
- Develop real-time graph data pipelines that capture and process relevant data, integrating vector databases for efficient similarity searches to uncover patterns indicative of fraudulent activities.
- Deploy the integrated solution to production, ensuring a robust and scalable infrastructure. Conduct benchmarking and performance optimization for both vector and graph databases to meet the demands of real-time fraud detection.
- Design and implement a cohesive system where vector and graph infrastructure complement each other seamlessly, combining their strengths to enhance fraud detection capabilities.
- Collaborate with cross-functional teams to identify specific use cases where vector and graph databases can work in tandem, devising data models that optimize the strengths of both technologies for effective fraud detection.
- Collaborate with the operations team to define and implement efficient maintenance and update strategies for both vector and graph databases, ensuring seamless operation in a dynamic production environment.
Skills and Experience:
- Bachelor's or Master's degree in Computer Science, Data Science, Machine Learning, or related field.
- Extensive experience in designing, implementing, and optimizing real-time graph data pipelines for fraud detection, with a proven track record of deploying solutions to production environments.
- Strong proficiency in graph theory, graph algorithms, and graph databases (e.g., Neo4j, Amazon Neptune, TigerGraph, …), coupled with extensive knowledge of vector databases (OpenSearch, Milvus, …).
- Proficient in Python, Java, or Scala for developing and maintaining data engineering pipelines, with expertise in Apache Spark, Flink, and containerization (Docker, Kubernetes). Experienced in cloud platforms (AWS, Azure, Google Cloud) and skilled in working with various databases and data warehouses for efficient data processing and storage.
- Familiar with operations tools such as Terraform, Jenkins, and CloudFormation for automating infrastructure provisioning, deployment, and maintenance in production environments.