Data Engineering for Data Scientists

Empowering data scientists to deliver better, more reliable results in record time

Nov 16, 2022

Please join PyData Pittsburgh for the presentation Data Engineering for Data Scientists by Pete Fein!

In this fast-paced talk, you’ll learn how adopting data engineering best practices and tools can improve your data science projects and empower you to deliver better, more reliable results in record time. We’ll discuss data architecture and design principles and explore open source tools you can use today, including:

Running Jupyter notebooks in production using Papermill and nbdev
Improve data quality with Great Expectations and monitor models with Evidently.ai
Write unit tests for your pandas and Spark DataFrames with pandera
Reusable SQL with dbt, an exciting new tool for data transformation that’s transforming data teams
Workflow orchestration with Apache Airflow, a better approach than fragile and frustrating cron jobs or Lambdas
Version control your data alongside your code with DVC

Discussion about this post

Ready for more?