We're offering 20% off September Live Online classes! See which courses are applicable.   |   Details >

  
AccountIcon BigDataIcon BlogIcon default_resource_icon CartIcon checkmark_icon cloud_devops_icon computer_network_admin_icon cyber_security_icon gsa_schedule_icon human_resources_icon location_icon phone_icon plus_icon programming_software_icon project_management_icon redhat_linux_icon search_icon sonography_icon sql_database_icon webinar_icon

Search UMBC Training Centers

Programming

Python for Data Engineers

Group Training + View more dates & times

                 
Overview

This course focuses on Python tools for data science initiatives. A primary goal is to leverage the powerful Python Data Science package, pandas. You will learn about data pipelines and build Extract, Transform, Load (ETL) processes using Python and pandas. ETL processes are often deployed by data engineers and data scientists to ingest data for use in an application and to manipulate that data for use by analysts. Participants will build ETL pipelines and ingest JSON and CSV files using APIs, Python and pandas. Focus is on using the pandas package for ingesting data into DataFrames, filtering and cleaning the data, and storing the final dataset either locally or in the cloud. Finally, participants will gain experience building a full pipeline data application using Python.

Who Should Take This Course

Audience

This course is suitable for anyone who has a firm understanding of basic programmatic structures in Python and who want to extend their application knowledge of Python to Data Engineering tasks such as data ingest, data wrangling and ETL processes.

Prerequisites

Students should have basic proficiency coding in Python with an understanding of Python data types, Boolean logic, control flow, looping constructs, as well as the basics of Python collections such as lists and dictionaries.

Why You Should Take This Course

Important learning outcomes include:

● Understand the different components of modern data pipelines.
● Learn applications to Data Engineering tasks such as data ingest, data filtering and cleaning.
● Understand the use of Extract, Transform, and Load (ETL) processes on data.
● Use the Python Data Science library, pandas, to ingest CSV data into a data pipeline.
● Leverage pandas for standard data engineering tasks of filtering and cleaning data in an ETL process
● Work with data formats commonly used in Data Science by data engineers and developers including JSON, CSV.
● Use HTTP and the Python requests module to access data made available by APIs.
● Gain basic data analytics skills using the pandas library by working with large datasets

Schedule
Course Outline

Getting Started with Jupyter Notebooks

● Running Python Notebooks from Google Colaboratory

The Data Pipeline and ETL (Extract, Transform, Load)

● What is a Data Pipeline?
● What is an ETL process?

Data Formats

● CSV and TSV File Formats
● Structure of JSON Data
● What is NDJSON?
● Using the csv and json Modules

Data Ingest via Application Programming Interfaces (APIs)

● Hypertext Transfer Protocol (HTTP)
● Application Programming Interfaces (APIs)
● Python requests Library
● Ingesting Data Sources via API Calls

Pandas Basics

● Why Pandas?
● Series
● DataFrames
● Populating DataFrames
● Importing CSV, Excel Data
● DataFrame Columns and Cells
● Manipulating Data in pandas DataFrames

Pandas and Data Wrangling

● Data Conversion
● Functions on DataFrames
● Sorting
● Statistics
● Data Cleaning
● Data Filtering
● Groupby
● Aggregate Functions
● Data Analysis

Full ETL Application

● Extracting the data that you need from Data Ingest
● Assembling NDJSON files
● Uploading files to Cloud Storage OR writing files locally

FAQs
Is there a discount available for current students?

UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.

What is the cancellation and refund policy?

Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.

Contact Us