AccountIcon BigDataIcon BlogIcon default_resource_icon CartIcon checkmark_icon cloud_devops_icon computer_network_admin_icon cyber_security_icon gsa_schedule_icon human_resources_icon location_icon phone_icon plus_icon programming_software_icon project_management_icon redhat_linux_icon search_icon sonography_icon sql_database_icon webinar_icon

Search UMBC Training Centers

Big Data Analytics

Data Science for Solution Architects

+ View more dates & times
  • Overview

    Business success of organizations in the information age largely depends on their ability to cost-effectively convert massive amounts of raw data coming from various sources into high-grade business information. In many organizations, Solution Architects are called upon to provide the much needed “data-to-information” conversion solutions. This class aims at helping Solution Architects and other IT practitioners understand the value proposition, methodology and techniques of the emerging Data Science discipline that is positioned to tackle many of the challenges posed by the modern data-driven business. The class also introduces the students to a number of existing production-ready technologies and capabilities that enable enterprises to build cost-efficient Big Data processing solutions.

    Topics covered include:

    • Applied Data Science and Business Analytics
    • Algorithms, Techniques and Common Analytical Methods
    • NoSQL and Big Data Systems Overview MapReduce
    • Big Data Business Intelligence and Analytics
    • Visualizing and Reporting Processed Results
  • Who Should Take This Course

    AUDIENCE

    Enterprise Architects, Solution Architects, Information Technology Architects, Business Analysts, Senior Developers, and Team Leads.

    PREREQUISITES

    Participants should have the general knowledge of statistics and programming.

  • Why You Should Take This Course

    This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from raw data using cost-effective data processing solutions. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

  • Schedule
  • Course Outline

    Chapter 1. Applied Data Science

    • What is Data Science?
    • Data Science Ecosystem
    • Data Mining vs. Data Science
    • Business Analytics vs. Data Science
    • Who is a Data Scientist?
    • Data Science Skill Sets Venn Diagram
    • Data Scientists at Work
    • Examples of Data Science Projects
    • An Example of a Data Product
    • Applied Data Science at Google
    • Data Science Gotchas

    Chapter 2. Data Analytics Life-Cycle Phases

    • Big Data Analytics Pipeline
    • Data Discovery Phase
    • Data Harvesting Phase
    • Data Priming Phase
    • Model Planning Phase
    • Model Building Phase
    • Communicating the Results
    • Production Roll-out

    Chapter 3. Getting Started With R

    • Introduction
    • Positioning of R in the Data Science Arena
    • R Integrated Development Environments
    • Running R
    • Ending the Current R Session
    • Getting Help
    • Getting System Information
    • General Notes on R Commands and Statements
    • R Data Structures
    • R Objects and Workspace
    • Assignment Operators
    • Assignment Example
    • Arithmetic Operators
    • Logical Operators
    • System Date and Time
    • Operations
    • User-defined Functions
    • User-defined Function Example
    • R Code Example
    • Type Conversion (Coercion)
    • Control Statements
    • Conditional Execution
    • Repetitive Execution
    • Repetitive execution
    • Built-in Functions
    • Reading Data from Files into Vectors
    • Example of Reading Data from a File
    • Writing Data to a File
    • Example of Writing Data to a File
    • Logical Vectors
    • Character Vectors
    • Matrix Data Structure
    • Creating Matrices
    • Working with Data Frames
    • Matrices vs Data Frames
    • A Data Frame Sample
    • Accessing Data Cells
    • Getting Info About a Data Frame
    • Selecting Columns in Data Frames
    • Selecting Rows in Data Frames
    • Getting a Subset of a Data Frame
    • Sorting (ordering) Data in Data Frames by Attribute(s)
    • Applying Functions to Matrices and Data Frames
    • Using the apply() Function
    • Example of Using apply()
    • Executing External R commands
    • Listing Objects in Workspace
    • Removing Objects in Workspace
    • Saving Your Workspace
    • Loading Your Workspace
    • Getting and Setting the Working Directory
    • Getting the List of Files in a Directory
    • Diverting Output to a File
    • Batch (Unattended) Processing
    • Importing Data into R
    • Exporting Data from R
    • Standard R Packages
    • Extending R
    • CRAN Page

    Chapter 4. Data Science Algorithms and Analytical Methods

    • Supervised vs Unsupervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Unsupervised Machine Learning Algorithms
    • Choose the Right Algorithm
    • Life-cycles of Machine Learning Development
    • Classifying with k-Nearest Neighbors (SL)
    • k-Nearest Neighbors Algorithm
    • k-Nearest Neighbors Algorithm
    • The Error Rate
    • Decision Trees (SL)
    • Decision Tree Terminology
    • Decision Trees in Pictures
    • Decision Tree Classification in Context of Information Theory
    • Information Entropy Defined
    • The Shannon Entropy Formula
    • The Simplified Decision Tree Algorithm
    • Using Decision Trees
    • Naive Bayes Classifier (SL)
    • Naive Bayesian Probabilistic Model in a Nutshell
    • Bayes Formula
    • Classification of Documents with Naive Bayes
    • Unsupervised Learning Type: Clustering
    • K-Means Clustering (UL)
    • K-Means Clustering in a Nutshell
    • Regression Analysis
    • Simple Linear Regression Model
    • Linear vs Non-Linear Regression
    • Linear Regression Illustration
    • Major Underlying Assumptions for Regression Analysis
    • Least-Squares Method (LSM)
    • Locally Weighted Linear Regression
    • Regression Models in Excel
    • Multiple Regression Analysis
    • Regression vs Classification
    • Time-Series Analysis
    • Decomposing Time-Series
    • Monte-Carlo Simulation (Method)
    • Who Uses Monte-Carlo Simulation?
    • Monte-Carlo Simulation in a Nutshell
    • Monte-Carlo Simulation Example
    • Monte-Carlo Simulation Example

    Chapter 5. Visualizing and Reporting Processed Results

    • Data Visualization
    • Data Visualization in R
    • The ggplot2 Data Visualization Package
    • Creating Bar Plots in R
    • Creating Horizontal Bar Plots
    • Using barplot() with Matrices
    • Using barplot() with Matrices Example
    • Customizing Plots
    • Histograms in R
    • Building Histograms with hist()
    • Example of using hist()
    • Pie Charts in R
    • Examples of using pie()
    • Generic X-Y Plotting
    • Examples of the plot() function
    • Dot Plots in R
    • Saving Your Work
    • Supported Export Options
    • Plots in RStudio
    • Saving a Plot as an Image
    • The BIRT Project
    • Visualization with D3 JavaScript Library
    • Examples of D3 Visualization
    • JavaFX
    • Data Visualization with JavaFX

    Chapter 6. Defining Big Data

    • Transforming Data into Business Information
    • Quality of Data
    • Gartner’s Definition of Big Data
    • More Definitions of Big Data
    • Processing Big Data
    • Challenges Posed by Big Data
    • The Cloud and Big Data
    • The Business Value of Big Data
    • Big Data: Hype or Reality?

    Chapter 7. What is NoSQL?

    • Limitations of Relational Databases
    • Limitations of Relational Databases (Cont’d)
    • Defining NoSQL
    • What are NoSQL (Not Only SQL) Databases?
    • The Past and Present of the NoSQL World
    • NoSQL Database Properties
    • NoSQL Benefits
    • NoSQL Database Storage Types
    • The CAP Theorem
    • Mechanisms to Guarantee a Single CAP Property
    • Limitations of NoSQL Databases
    • Big Data Sharding
    • Sharding Example

    Chapter 8. MapReduce Overview

    • MapReduce Defined
    • Google’s MapReduce
    • MapReduce Explained
    • MapReduce Word Count Job
    • MapReduce Shared-Nothing Architecture
    • Similarity with SQL Aggregation Operations
    • Example of Map & Reduce Operations using JavaScript
    • Problems Suitable for Solving with MapReduce
    • Typical MapReduce Jobs
    • Fault-tolerance of MapReduce
    • Distributed Computing Economics
    • MapReduce Systems

    Chapter 9. Big Data Business Intelligence and Analytics

    • Traditional Business Intelligence and Analytics
    • OLAP Tasks
    • Data Mining Tasks
    • Big Data / NoSQL Solutions
    • NoSQL Data Querying and Processing
    • Amazon Elastic MapReduce
    • Big Data with Google App Engine (GAE)

    Chapter 10. MongoDB Overview

    • MongoDB Features (Cont’d)
    • MongoDB Operational Intelligence
    • MongoDB Use Cases
    • MongoDB Data Model
    • MongoDB Query Language (QL)
    • A MongoDB QL Example

    Chapter 11. Hadoop Overview

    • Apache Hadoop
    • Typical Hadoop Applications
    • Hadoop Clusters
    • Hadoop Design Principles
    • Hadoop’s Core Components
    • Hadoop Simple Definition
    • High-Level Hadoop Architecture
    • Hadoop-based Systems for Data Analysis
    • Hadoop Caveats

    Chapter 12. Hadoop Distributed File System (HDFS) Overview

    • Hadoop Distributed File System
    • Data Blocks
    • Data Block Replication Example
    • HDFS NameNode Directory Diagram
    • Accessing HDFS
    • Examples of HDFS Commands
    • Client Interactions with HDFS for the Read Operation
    • Read Operation Sequence Diagram
    • Client Interactions with HDFS for the Write Operation
    • Communication inside HDFS

    Chapter 13. Apache Pig Scripting Platform

    • What is Pig?
    • Pig Latin
    • Pig Execution Modes
    • Local Execution Mode
    • MapReduce Execution Mode
    • Running Pig
    • Running Pig in Batch Mode
    • What is Grunt?
    • Pig Latin Statements
    • Pig Programs
    • Pig Latin Script Example
    • SQL Equivalent
    • Differences between Pig and SQL
    • Statement Processing in Pig
    • Comments in Pig
    • Supported Simple Data Types
    • Supported Complex Data Types
    • Arrays
    • Defining Relation’s Schema
    • The bytearray Generic Type
    • Using Field Delimiters
    • Referencing Fields in Relations

    Chapter 14. Apache Pig HDFS Interface

    • The HDFS Interface
    • FSShell Commands (Short List)
    • Grunt’s Old File System Commands

    Chapter 15. Apache Pig Relational and Eval Operators

    • Pig Relational Operators
    • Example of Using the JOIN Operator
    • Example of Using the Order By Operator
    • Caveats of Using Relational Operators
    • Pig Eval Functions
    • Caveats of Using Eval Functions (Operators)
    • Example of Using Single-column Eval Operations
    • Example of Using Eval Operators For Global Operations

    Chapter 16. Hive

    • What is Hive?
    • Hive’s Value Proposition
    • Who uses Hive?
    • Hive’s Main Systems
    • Hive Features
    • Hive Architecture
    • HiveQL
    • Where are the Hive Tables Located?
    • Hive Command-line Interface (CLI)

    Chapter 17. Hive Command-Line Interface

    • Hive Command-line Interface (CLI)
    • The Hive Interactive Shell
    • Running Host OS Commands from the Hive Shell
    • Interfacing with HDFS from the Hive Shell
    • The Hive in Unattended Mode
    • The Hive CLI Integration with the OS Shell
    • Executing HiveQL Scripts
    • Comments in Hive Scripts
    • Variables and Properties in Hive CLI
    • Setting Properties in CLI
    • Example of Setting Properties in CLI
    • Hive Namespaces
    • Using the SET Command
    • Setting Properties in the Shell
    • Setting Properties for the New Shell Session

    Chapter 18. Hive Data Definition Language

    • Hive Data Definition Language
    • Creating Databases in Hive
    • Using Databases
    • Creating Tables in Hive
    • Supported Data Type Categories
    • Common Primitive Types
    • Example of the CREATE TABLE Statement
    • The STRUCT Type
    • Table Partitioning
    • Table Partitioning
    • Table Partitioning on Multiple Columns
    • Viewing Table Partitions
    • Row Format
    • Data Serializers / Deserializers
    • File Format Storage
    • More on File Formats
    • The EXTERNAL DDL Parameter
    • Example of Using EXTERNAL
    • Creating an Empty Table
    • Dropping a Table
    • Table / Partition(s) Truncation
    • Alter Table/Partition/Column
    • Views
    • Create View Statement
    • Why Use Views?
    • Restricting Amount of Viewable Data
    • Examples of Restricting Amount of Viewable Data
    • Creating and Dropping Indexes
    • Describing Data

    Chapter 19. Hive SELECT Statement

    • HiveQL
    • The SELECT Statement Syntax
    • The WHERE Clause
    • Examples of the WHERE Statement
    • Partition-based Queries
    • Example of an Efficient SELECT Statement
    • The DISTINCT Clause
    • Supported Numeric Operators
    • Built-in Mathematical Functions
    • Built-in Aggregate Functions
    • Built-in Statistical Functions
    • Other Useful Built-in Functions
    • The GROUP BY Clause
    • The HAVING Clause
    • The LIMIT Clause
    • The ORDER BY Clause
    • The JOIN Clause
    • The CASE … Clause
    • Example of CASE … Clause

    Chapter 20. Apache Sqoop

    • What is Sqoop?
    • Sqoop Import / Export
    • Sqoop Help
    • Examples of Using Sqoop Commands
    • Data Import Example
    • Fine-tuning Data Import
    • Controlling the Number of Import Processes
    • Data Splitting
    • Helping Sqoop Out
    • Example of Executing Sqoop Load in Parallel
    • A Word of Caution: Avoid Complex Free-Form Queries
    • Using Direct Export from Databases
    • Example of Using Direct Export from MySQL
    • More on Direct Mode Import
    • Changing Data Types
    • Example of Default Types Overriding
    • File Formats
    • The Apache Avro Serialization System
    • Binary vs Text
    • More on the SequenceFile Binary Format
    • Generating the Java Table Record Source Code
    • Data Export from HDFS
    • Export Tool Common Arguments
    • Data Export Control Arguments
    • Data Export Example
    • Using a Staging Table
    • INSERT and UPDATE Statements
    • INSERT Operations
    • UPDATE Operations
    • Example of the Update Operation
    • Failed Exports

    Chapter 21. Cloudera Impala

    • What is Cloudera Impala?
    • Benefits of Using Impala
    • Key Features
    • How Impala Handles SQL Queries
    • Impala Programming Interfaces
    • Impala SQL Language Reference
    • Differences Between Impala and HiveQL
    • Impala Shell
    • Impala Shell Main Options
    • Impala Shell Commands
    • Impala Common Shell Commands
    • Cloudera Web Admin UI
    • Impala Browse-based Query Editor
  • FAQs
    Is there a discount available for current students?

    UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.

    What is the cancellation and refund policy?

    Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.

Contact Us