Job Location: Pune (Work from Office)
Employment Type: Full-Time
Experience: 4+ Years
Job Overview:
We are looking for a skilled Data Engineer to design, build, and optimize scalable data pipelines and systems. The ideal candidate will have hands-on experience with data lake development, orchestration tools, and big data technologies. A strong foundation in Python and SQL is essential, along with expertise in optimizing data processing frameworks.
Key Responsibilities:
- Design and implement robust data pipelines to support analytics and reporting.
- Develop and maintain data lakes for efficient data storage and processing.
- Optimize Spark/PySpark and Hive applications for performance and scalability.
- Implement and manage workflow orchestration using Apache Airflow.
- Collaborate with cross-functional teams to ensure data quality, reliability, and accessibility.
Required Skills:
- Programming: Advanced proficiency in Python.
- SQL Expertise: Strong ability to write and optimize complex SQL queries.
- Big Data Tools: Hands-on experience with Spark and Hive.
- Data Lakes: Proven experience in data lake development.
- Orchestration: Experience with Apache Airflow or similar tools.
Preferred Skills (Good to Have):
- Familiarity with Trino or AWS Athena.
- Experience with Snowflake.
- Understanding of data quality frameworks and best practices.
- Knowledge of file storage solutions like Amazon S3.
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- Demonstrated ability to design and optimize scalable data solutions.
- Strong problem-solving and analytical skills.