Link copied to clipboard!
Back to Jobs
Big Data Developer (New York) at Capgemini
Capgemini
New York, NY
Information Technology
Posted 0 days ago
Job Description
Were looking for a seasoned Senior Data Engineer with strong Hadoop to design, build, and scale data pipelines and platforms powering analytics, AI/ML, and business operations. Youll own end-to-end data engineeringfrom ingestion and transformation to performance optimizationacross large-scale distributed systems and modern cloud data platforms.Key ResponsibilitiesDesign & Build Data Pipelines: Architect, develop, and maintain robust ETL/ELT pipelines for batch and streaming data using Hadoop ecosystem, Spark, and Airflow.Big Data Architecture: Define and implement scalable big data architectures, ensuring reliability, fault tolerance, and cost efficiency.Data Modeling: Develop and optimize data models for Data Warehouse and Operational Data Store (ODS); ensure conformed dimensions and star/snowflake schemas where appropriate.SQL Expertise: Write, optimize, and review complex SQL/HiveQL queries for large datasets; enforce query standards and patterns.Performance Tuning: Optimize Spark jobs, SQL queries, storage formats (e.g., Parquet/ORC), partitioning, and indexing to improve latency and throughput.Data Quality & Governance: Implement data validation, lineage, cataloging, and security controls across environments.Workflow Orchestration: Build and manage DAGs in Airflow, ensuring observability, retries, alerting, and SLAs.Cross-functional Collaboration: Partner with Data Science, Analytics, and Product teams to deliver reliable datasets and features.Best Practices: Champion coding standards, CI/CD, infrastructure-as-code (IaC), and documentation across the data platform.Required Qualifications7+ years of hands-on data engineering experience building production-grade pipelines.Strong experience with Hadoop (HDFS, YARN), Hive SQL/HiveQL, Spark (Scala/Java/PySpark), and Airflow.Expert-level SQL skills with the ability to write and tune complex queries on large datasets.Solid understanding of Big Data architecture patterns (e.g., lakehouse, data lake + warehouse, CDC).Deep knowledge of ETL/ELT and DW/ODS concepts (slowly changing dimensions, partitioning, columnar storage, incremental loads).Proven track record in performance tuning for large-scale systems (Spark jobs, shuffle optimizations, broadcast joins, skew handling).Strong programming background in Java and/or Scala (Python is a plus).Preferred SkillsExperience with AI-driven data processing (feature engineering pipelines, ML-ready datasets, model data dependencies).Hands-on with cloud data platforms (AWS, GCP, or Azure)services like EMR/Dataproc/HDInsight, S3/GCS/ADLS, Glue/Dataflow, BigQuery/Snowflake/Redshift/Synapse.Exposure to NoSQL databases (Cassandra, HBase, DynamoDB, MongoDB).Advanced data governance & security (row/column-level security, tokenization, encryption at rest/in transit, IAM/RBAC, data lineage/catalog).Familiarity with Kafka (topics, partitions, consumer groups, schema registry, stream processing).Experience with CI/CD for data (Git, Jenkins/GitHub Actions, Terraform), containerization (Docker, Kubernetes).Knowledge of metadata management and data observability (Great Expectations, Monte Carlo, OpenLineage).Life at Capgemini:Capgemini supports all aspects of your well-being throughout the changing stages of your life and career. For eligible employees, we offer:Flexible workHealthcare including dental, vision, mental health, and well-being programsFinancial well-being programs such as 401(k) and Employee Share Ownership PlanPaid time off and paid holidaysPaid parental leaveFamily building benefits like adoption assistance, surrogacy, and cryopreservationSocial well-being benefits like subsidized back-up child/elder care and tutoringMentoring, coaching and learning programsEmployee Resource GroupsDisaster ReliefDisclaimer:Capgemini is an Equal Opportunity Employer encouraging diversity in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law.This is a general description of the Duties, Responsibilities and Qualifications required for this position. Physical, mental, sensory or environmental demands may be referenced in an attempt to communicate the manner in which this position traditionally is performed. Whenever necessary to provide individuals with disabilities an equal employment opportunity, Capgemini will consider reasonable accommodations that might involve varying job requirements and/or changing the way this job is performed, provided that such accommodations do not pose an undue hardship.Capgemini is committed to providing reasonable accommodations during our recruitment process. If you need assistance or accommodation, please reach out to your recruiting contact.Click the following link for more information on your rights as an Applicant http://www.capgemini.com/resources/equal-employment-opportunity-is-the-law
Resume Suggestions
Highlight relevant experience and skills that match the job requirements to demonstrate your qualifications.
Quantify your achievements with specific metrics and results whenever possible to show impact.
Emphasize your proficiency in relevant technologies and tools mentioned in the job description.
Showcase your communication and collaboration skills through examples of successful projects and teamwork.