UMBC Training Centers’ Big Data Analytics Training Program was designed to help individuals and organizations gain the technical and managerial skills required to conduct data analytic studies; evaluate, plan, manage and complete data analytics projects; develop custom data analytic software; and administer and maintain analytic systems at scale.
Introduction to Data Analytics and Big Data
The purpose of this course is for a student to get a broad familiarity with the relevant concepts of data analytics and data science and how they are applied to a wide range of business, scientific and engineering problems. The course will also explore the unique challenges of doing data analytics at very large scales, i.e. “Big Data”.
Data Analysts / Data Scientists
Introduction to Machine Learning
In recent years industry, not just academia, has found that creating powerful data models provides the next level of value past traditional business intelligence. This course focuses on state of the art machine learning techniques combined with a practical approach designed to teach you to process your data and build models using Python’s scikit-learn. In this class you will learn to load and analyze your data with Pandas (a data analysis library), build visualizations with pyplot, and create predictive models using scikit-learn.
Deep Learning with TensorFlow
TensorFlow is an open source machine learning library from Google designed for numerical computation using data flow graphs. Nodes represent mathematical operations, while edges in the graphs represent tensors being passed between nodes. While this framework lends itself particularly well to deep learning with neural networks, any framework can be added to the graph, allowing for extreme flexibility.
Introduction to Data Visualization
We are constantly faced with a vast amount of complex information – often more than we can handle. Well-designed visual interpretations of data improve comprehension, communication, and decision making. This workshop introduces data methods and techniques that increase the understanding of complex data. The focus is on conveying ideas effectively with visually appealing charts, graphs and maps. Participants will learn to craft clear, meaningful pictures of complex statistics and publicly available data through the creation of effective graphs and charts.
Introduction to SQL
This course is designed to give users an understanding of SQL Language. The course covers SQL commands for DML, DDL, Query, and Transaction Control operations.
Introduction to Statistics
This course is an introduction to statistical methods common to engineering, science and social sciences applications.
Data Analysis with SAS
This objective of this course is to provide students with a strong foundation in fundamental concepts of statistics that are both theoretical and applied. The course will teach enough statistical theory so that students can become educated consumers of analytical methodology, with an emphasis on application of these techniques to reach sound conclusions from real-world data.
Data Analysis with Excel
This objective of this course is to provide students with a strong foundation in fundamental concepts of statistics that are both theoretical and applied. The course will teach enough statistical theory so that students can become educated consumers of analytical methodology, with an emphasis on application of these techniques to reach sound conclusions from real-world data. The material will begin with basic concepts and methods, such as probability, descriptive statistics, exploratory analysis, and inferential testing. The course progresses to more complex material, such as regression modeling. Analytical challenges unique to large and/or heterogeneous datasets will also be explored. All analytical techniques will be illustrated with examples using Microsoft Excel. Students will analyze a variety of real world sample data sets during the course.
Applied Data Science and Big Data Analytics
Business success in the information age is predicated on the ability of organizations to convert raw data coming from various sources into high-grade business information. To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Spark’s distributed compute capability along with its built-in machine learning library, or switching from proprietary and costly solutions to the free R programming language.
Data Analysis for Cyber & IT Professionals
This course is offered in a number of variants, each of which focuses on data within a specific industry or domain (e.g. finance, health care, marketing, and IT/Cyber). The IT/Cyber course focuses on the analysis of data within an enterprise IT infrastructure, to be analyzed for the purposes of monitoring the health and security of operational systems and networks; to detect threats or breaches of systems and networks; penetration testing; forensic analysis; and incident response.
Hortonworks HDP Analyst: Apache HBase Essentials
This course is designed for big data analysts who want to use the HBase NoSQL database which runs on top of HDFS to provide real-time read/write access to sparse datasets. Topics include HBase architecture, services, installation and schema design.
Hortonworks HDP Developer: Java
This advanced four-day course provides Java programmers a deep-dive into Hadoop 2.0 application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop 2.0 using the Hortonworks Data Platform. Students who attend this course will learn how to harness the power of Hadoop 2.0 to manipulate, analyze and perform computations on their Big Data.
Hortonworks HDP Developer: Apache Pig and Hive
This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
Hortonworks HDP Developer: Enterprise Spark I
This course is designed as an entry point for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. Topics include: An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN; using Spark Core APIs for interactive data exploration; Spark SQL and DataFrame operations; Spark Streaming and DStream operations; data visualization, reporting, and collaboration; performance monitoring and tuning; building and deploying Spark applications; and an introduction to the Spark Machine Learning Library.
Hortonworks HDP Analyst: Data Science
This course provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.
Hortonworks HDP Operations: Administration Foundations
This course is intended for systems administrators who will be responsible for the design, installation, configuration, and management of the Hortonworks Data Platform (HDP). The course provides in-depth knowledge and experience in using Apache Ambari as the operational management platform for HDP. This course presumes no prior knowledge or experience with Hadoop.
Hortonworks HDP Operations: Hadoop Administration 2
This course is designed for experienced administrators who manage Hortonworks Data Platform (HDP) 2.3 clusters with Ambari. It covers upgrades, configuration, application management, and other common tasks.
Hortonworks HDP Operations: Security
This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools.
Data Science for Solution Architects
Business success of organizations in the information age largely depends on their ability to cost-effectively convert massive amounts of raw data coming from various sources into high-grade business information. In many organizations, Solution Architects are called upon to provide the much needed “data-to-information” conversion solutions. This class aims at helping Solution Architects and other IT practitioners understand the value proposition, methodology and techniques of the emerging Data Science discipline that is positioned to tackle many of the challenges posed by the modern data-driven business. The class also introduces the students to a number of existing production-ready technologies and capabilities that enable enterprises to build cost-efficient Big Data processing solutions.
Big Data on AWS
Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform. In this course, we show you how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Hive and Hue. We also teach you how to create big data environments, work with Amazon DynamoDB, Amazon Redshift, and Amazon Kinesis, and leverage best practices to design big data environments for security and cost-effectiveness.
Data Warehousing on AWS
Data Warehousing on AWS introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS. This course demonstrates how to collect, store, and prepare data for the data warehouse by using other AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis Firehose, and Amazon S3. Additionally, this course demonstrates how to use business intelligence tools to perform analysis on your data.