This class introduces participants to the Apache Spark platform, the Spark Shell and Spark SQL for big data processing applications. In addition to the Spark platform, participants will learn fundamental tools in the pandas library and gain experience with data visualization using seaborn.
Success of many organizations depends on their ability to derive business insights from massive amount of raw data coming from various sources. Apache Spark offers many engineering improvements over the traditional MapReduce programming model as implemented in Hadoop by providing multi-pass in-memory processing of data which boosts the overall performance of your ETL and machine-learning […]
This course provides theoretical and practical aspects of using Python applied to Data Science, Business Analytics, and Data Logistics. Emphasis is on a survey of core concepts, terminology, and theory. This course is supplemented by a variety of hands-on labs that help participants reinforce their theoretical knowledge of the learned material.
In this intermediate-level course, individuals learn how to solve a real-world use case with Machine Learning (ML) and produce actionable results using Amazon SageMaker. This course walks through the stages of a typical data science process for Machine Learning from analyzing and visualizing a dataset to preparing the data, and feature engineering. Individuals will also […]
This course introduces students to the Cloud Computing value proposition; Cloud Computing solution models, and core Amazon Web Services (AWS) services and foundational technologies. Course attendees are provided with insights that will enable them to intelligently translate their organization’s business requirements into Cloud and AWS-based IT solutions. Topics covered include: Articulate the Cloud Computing Business […]
Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform. In this course, we show you how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Hive and Hue. […]
This course provides an in-depth overview of the choices you have in processing Big Data. It introduces Big Data, the types of data you might have, approaches to working on and processing the data, and the capabilities, strengths, and weaknesses of those approaches. Topics covered include: NewSQL Databases NoSQL Overview Hadoop and MapReduce Apache Pig […]
The objective of this course is to fully explore the uses of Microsoft Excel as a data analysis tool. Most business professionals are familiar with the core functionality of Excel. This course explores some of the additional capabilities and advanced features of Excel for analyzing, manipulating and visualizing data.
Data Engineering has become an important role in the Data Science space. For Data Analysts to do productive work, they need to have consistent datasets to analyze. A Data Engineer provides this consistency for analysts by accessing data in a variety of formats, using a variety of tools. This class will introduce programmers to tools […]
This course covers the theoretical and practical aspects of applying the principles and methods of Data Science and Data Engineering. Students are introduced to the relevant concepts, terminology, theory, and tools used in the field. This training course is complemented by a variety of hands-on exercises to help the attendees reinforce their theoretical knowledge of […]
Data Visualization is the graphical representation of large datasets using graphs and charts such as bar charts, line graphs, scatterplots, etc. Learn how to elegantly present datasets that allow your audience to quickly digest, understand, and derive insights or see trends from the data. This course provides students with hands-on experience creating effective and appealing […]
Matplotlib is a data visualization library for Python. As part of the SciPy data analysis library it is widely used to create data graphics. However, Matplotlib is older than the pandas library, the most common Python library for data frame manipulation. The Matplotlib library requires some extra steps when plotting data from pandas data frames […]