Apache Spark
Success of many organizations depends on their ability to derive business insights from massive amount of raw data coming from various sources. Apache Spark offers many engineering improvements over the traditional MapReduce programming model as implemented in Hadoop by providing multi-pass in-memory processing of data which boosts the overall performance of your ETL and machine-learning […]
Hadoop With Spark
Hadoop is a mature Big Data environment and Hive is the de-facto standard for the SQL interface. Today, the computations in Hadoop are usually done with Spark. Spark offers an optimized compute engine that includes batch, and real-time streaming, and machine learning. This course covers Hadoop 3, Hive 3, and Spark 3.
Data Visualization with Tableau
Data Visualization is the graphical representation of large datasets using graphs and charts such as bar charts, line graphs, scatterplots, etc. Learn how to elegantly present datasets that allow your audience to quickly digest, understand, and derive insights or see trends from the data. This course teaches students how to work with Tableau to create […]
SQL for Data Analytics
This course provides you with an overview of Structured Query Language (SQL) so that you can quickly begin working with and analyzing data with other data science tools. Before you can analyze data, you need to have the correct data. Many organizations store their data in structured databases and SQL is the language of choice to […]
Big Data Overview
This course provides an in-depth overview of the choices you have in processing Big Data. It introduces Big Data, the types of data you might have, approaches to working on and processing the data, and the capabilities, strengths, and weaknesses of those approaches. Topics covered include: NewSQL Databases NoSQL Overview Hadoop and MapReduce Apache Pig […]
Introduction to Machine Learning
This course introduces participants to both supervised and unsupervised learning algorithms with discussion of what datasets lend themselves to solutions with the various ML techniques. Hands-on labs are designed to assist the learner in understanding the concepts and are all done using Jupyter Notebooks. Where necessary, background material in Linear Algebra, Probability, and Python will […]
Introduction to Data Visualization
We are constantly faced with a vast amount of complex information – often more than we can handle. Well-designed visual interpretations of data improve comprehension, communication, and decision making. This workshop introduces data methods and techniques that increase the understanding of complex data. The focus is on conveying ideas effectively with visually appealing charts, graphs and […]
Data Warehousing on Amazon Web Services (AWS)
Data Warehousing on AWS introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS. This course demonstrates how to collect, store, and prepare data for the data warehouse by using other AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis […]
Big Data on Amazon Web Services (AWS)
Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform. In this course, we show you how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Hive and Hue. […]