Mandatory Skills : Python, Cloud experience, Kafka, Openshift ECS
Responsibilities :
- Integral team member of our Data Engineering team responsible for the design and development of Big data solutions Partner with domain experts, product managers, analysts, and data scientists to develop Big Data pipelines in Hadoop or Snowflake Responsible for delivering data as a service framework
- Responsible for moving all legacy workloads to the cloud platform
- Work with data scientists to build Client pipelines using heterogeneous sources and provide engineering services for data science applications
- Ensure automation through CI / CD across platforms both in cloud and on-premises
- Ability to research and assess open source technologies and components to recommend and integrate into the design and implementation
- Be the technical expert and mentor other team members on Big Data and Cloud Tech stacks
- Define needs around maintainability, testability, performance, security, quality, and usability for the data platform
- Drive implementation, consistent patterns, reusable components, and coding standards for data engineering processes
- Convert SAS-based pipelines into languages like PySpark, and Scala to execute on Hadoop and non-Hadoop ecosystems
- Tune Big data applications on Hadoop and non-Hadoop platforms for optimal performance
- Evaluate new IT developments and evolving business requirements and recommend appropriate systems alternatives and / or enhancements to current systems by analyzing business processes, systems, and industry standards.
- Supervise day-to-day staff management issues, including resource management, work allocation, mentoring / coaching and other duties and functions as assigned
Qualifications :
8+ years of experience in Hadoop / big data technologies.3+ years of experience in spark.2+ years experience in Snowflake2+ years of experience working on Google or AWS cloud developing data solutions. Certifications preferred.Hands-on experience with Python / Pyspark / Scala and basic libraries for machine learning is required;Experience with containerization and related technologies (e.g. Docker, Kubernetes)Experience with all aspects of DevOps (source control, continuous integration, deployments, etc.)1 year of 1Hadoop administration experience preferred1+ year of SAS experience preferred.Comprehensive knowledge of the principles of software engineering and data analyticsAdvanced knowledge of the Hadoop ecosystem and Big Data technologies Hands-on experience with the Hadoop eco-system (HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Kudu, Solr)Knowledge of agile (scrum) development methodology is a plusStrong development / automation skillsProficient in Java or Python programming with prior Apache Beam / Spark experience is a plus.System level understanding - Data structures, algorithms, distributed storage & computeCan-do attitude on solving complex business problems, good interpersonal and teamwork skillsEducation :Bachelor's degree / University degree or equivalent experience