Overview
Understand and prioritize business problems and identify ways to leverage data to recommend solutions to business problems. Organize and synthesize data into actionable business decisions, focused on insights. Provide insight into trends, financial and business operations through data analysis and the development of business intelligence visuals.
Work with advanced business intelligence tools to complete complex calculations, table calculations, geographic mapping, data blending, and optimization of data extracts.
Apply all phases of the Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.
Proficient in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase and Oozie with AWS EC2 / Azure VMs cloud computing.
Expertise in using Hive for creating tables, data distribution by implementing partitioning and bucketing. Capable of developing, tuning and optimizing HQL queries.
Proficient in importing and exporting data using SQOOP from HDFS to relational database systems and vice-versa.
Expert in Spark SQL and Spark DataFrames using Scala for distributed data processing. Develop DataFrame and RDD (Resilient Distributed Datasets) to achieve unified transformations on the data load.
Expertise in various scripting languages like Linux / Unix shell scripts and Python. Develop scheduling and monitoring Oozie workflows for parallel execution of jobs.
Experience in cloud environments AWS EMR, EC2, S3 and Athena and GCP BigQuery. Transfer data from different platforms into AWS.
Diverse experience with databases like SQL Server, MySQL, IBM DB2 and Netezza. Manage the source code in GitHub. Track and deliver requirements in Jira. Expertise in IDEs and tools like Eclipse, GitHub, Jenkins, Maven and IntelliJ.
Optimize Spark applications to improve performance and reduce time on the Hadoop cluster. Proficient in executing Hive queries using Hive CLI, Web GUI Hue and Impala to read, write and query data.
Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time. Create metrics and apply business logic using Spark, Scala, R, Python, and / or Java.
Model, design, develop, code, test, debug, document and deploy applications to production through standard processes; build business models using data science skills. Harmonize, transform, and move data from raw formats to consumable and curated views. Apply strong data governance principles, standards, and frameworks to promote data consistency and quality while protecting the integrity of corporate data.
Location and Title
Job Title : Data Scientist
Location : Virginia — Hybrid
Position Qualifications
Education Required : Bachelor’s and / or Master’s degree in Computer Science, Analytics, Statistics, or similar field.
Required or Acceptable Job-Related Experience : 8 – 10 years related experience
Technical / Other Skills Required
Eligibility for EEO statements : This description reflects the general nature and level of work. It is not intended to be a comprehensive list of skills or responsibilities. EEO statements and legal requirements remain applicable.
#J-18808-Ljbffr
Data Scientist Hybrid VA • Richmond, Metro Vancouver Regional District, CA