Lead Data Scientist – Remote OK

The Nielsen Company

2021-10-13T11:35:25Z

Cockeysville Maryland

United States

Scientific Research

(No Timezone Provided)

Lead Data Scientist - Remote OK - 101791

Data Science - USA Offsite, Offsite

The Lead Data Scientist’s primary responsibility in the Audio Data Science team is to develop creative solutions to enhance the data and analysis infrastructure and pipeline which underpins the survey quality for all Nielsen Audio survey products. In order to deliver high quality standards, the Data Scientist will work as subject matter expert on a team of analysts to establish, maintain and continuously improve data tools and processes supporting the Audio data science team.
Tasks will include developing system enhancements, procedural and technological documentation, working with cross functional teams to implement solutions into production systems, supporting survey methodology enhancement projects, and supporting client facing data requests.

What will I do?

Maintain and continuously improve the variety of data infrastructure, analysis, production and QA processes for the Audio Data Science team

Assist in the transition of the data science tech infrastructure away from legacy systems and methods

Work with cross-functional teams to implement and validate enhanced audience measurement methodologies

Build and refine data queries from large relational databases/data warehouses/data lakes for various analyses and/or requests

Utilize tools such as Python, Tableau, AWS, Databricks etc. to independently develop, test and implement high quality custom, modular code to perform complex data analysis, visualizations, and answer client queries

Maintain and update comprehensive documentation on departmental procedures, checklists and metrics

Implement prevention and detection controls to ensure data integrity, as well as detect and address quality escapes

Work closely with internal customers and IT personnel to improve current processes and engineer new methods, frameworks and data pipelines

Work as an integral member of the Audio Data Science team in a time-critical production environment

Key tasks include – but are not limited to – data integration, data harmonization, automation, examining large volumes of data, identifying & implementing methodological, process & technology improvements

Develop and maintain the underlying infrastructure to support forecasting & statistical models, machine learning solutions, big data pipelines (from internal and external sources) used in a production environment

Is this for me?

Undergraduate or graduate degree in mathematics, statistics, engineering, computer science, economics, business or fields that employ rigorous data analysis

Must be proficient with Python (and Spark/Scala) to develop sharable software with the appropriate technical documentation

Experience utilizing Gitlab, Git or similar to manage code development

Experience utilizing Apache Spark, Databricks & Airflow

Expertize with Tableau, or other data visualization software and techniques

Experience in containerization such as Docker and/or Kubernetes

Expertize in querying large datasets with SQL and of working with Oracle, Netezza, Data Warehouse and Data Lake data structures

Experience in leveraging CI/CD pipelines

Experience utilizing cloud computing platforms such as AWS, Azure, etc

Strong ability to proactively gather information, work independently as well as within an multi disciplinary team

E- Proficiency in MS Office suite (Excel, Access, PowerPoint and Word) and/or Google Office Apps (Sheets, Docs, Slides, Gmail)

Preferred

Knowledge of machine learning and data modeling techniques such as Time Series, Decision Trees, Random Forests, SVM, Neural Networks, Incremental Response Modeling, and Credit Scoring

Knowledge of survey sampling methodologies

Knowledge of statistical tests and procedures such as ANOVA, Chi-squared, Correlation, Regression, etc