Job Description
Job Title: AWS Databricks Architect with Program Management
Location: REMOTE + Travel
Duration: / Term: Fulltime
Job Description:
Desired Experience:
- Cloud Platforms (broad knowledge of basic utilities and in-depth knowledge of more than one of the cloud platforms)
- Implement scalable and sustainable data engineering solutions using tools such as Databricks, Azure, Apache Spark, and Python. The data pipelines must be created, maintained, and optimized as workloads move from development to production for specific use cases.
- Architecture Design Experience for Cloud and Non-cloud platforms
- Expertise with various ETL technologies and familiar with ETL tools
- Scrum Management experience with medium and large-scale projects
- Experience to whiteboard enterprise level architectures
- Must have extensive Range of knowledge of cloud/on premise tools and architectures
- Experience to implement large scale hybrid cloud platform & applications
- Knowledge of one or more scripting language
- Experience with CI/CD
- Ability to set and lead the technical vision while balancing business drivers
- Ability to understand AWS EMR (Elastic MapReduce).
- Execute Change Tasks including but not limited to types Partition tables using partitioning strategy
- Creating and optimizing star schema models in a Dedicated SQL Pool
- Experience on Data Migration project from AWS EMR to Databricks. security provisioning schema creation DB role creation DDL execution data restoration etc.
- Review workload and provide recommendations for performance optimization and operational efficiency
- Proactively adjust capacity based on current utilization and upcoming usage guestimate projections
- Document and advise developers and end-users about best practices and housekeeping
- Work on incidents / onboarding issues related to Pivoting to Cloud from on-prem datastores.
- Be able to identify performance bottlenecks Monitor Azure Synapse Analytics using Dynamic Management Views
- Build performing data with Table Distribution and Index
- Design, develop, and optimize ETL/ELT data pipelines using Databricks on AWS.
- Collaborate with data scientists, analysts, and stakeholders to understand business requirements and provide technical solutions.
- Implement big data processing solutions using Spark on Databricks.
- Manage and maintain Databricks clusters and optimize resource utilization to improve performance.
- Develop and maintain CI/CD pipelines for data ingestion and transformation processes.
- Integrate Databricks with other AWS services such as S3, Redshift, Glue, Athena, and Lambda.
- Implement and manage data lakes using AWS S3 and Delta Lake architecture.
- Ensure data quality, governance, and security by implementing best practices.
- Monitor, troubleshoot, and optimize Databricks jobs and clusters for performance and cost efficiency.
- Automate workflows and support real-time data streaming processes using Kafka, Kinesis, or AWS Glue.
- Work with DevOps teams to manage infrastructure as code using tools like Terraform or AWS CloudFormation.:
- Strong experience with Databricks on AWS (2+ years).
- Proficiency in Apache Spark and distributed data processing.
- Hands-on experience with AWS services such as S3, Redshift, EC2, Lambda, Glue, and EMR.
- Expertise in Python, SQL, and Scala for data processing.
- Experience with Delta Lake and building data lakes.
- Familiarity with CI/CD pipelines using tools such as Jenkins, Git, or CodePipeline.
- Experience with infrastructure as code (IaC) tools like Terraform or AWS CloudFormation.
- Knowledge of data governance, data security, and compliance frameworks.
- Strong analytical and problem-solving skills, with the ability to optimize complex data workflows.
- Excellent communication skills and the ability to work in a fast-paced, collaborative environment.
Key Skills:
AWS EMR, Databricks, Azure, Apache Spark, and Python, S3, Redshift, Glue, Athena, and Lambda
Job Tags
Full time, Remote job,