AccountIcon BigDataIcon BlogIcon default_resource_icon CartIcon checkmark_icon cloud_devops_icon computer_network_admin_icon cyber_security_icon gsa_schedule_icon human_resources_icon location_icon phone_icon plus_icon programming_software_icon project_management_icon redhat_linux_icon search_icon sonography_icon sql_database_icon webinar_icon

Search UMBC Training Centers

Big Data Analytics

Hortonworks HDP Analyst: Data Science

+ View more dates & times
  • Overview

    This course provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

    Upon completion of this course, students will be able to:

    • Recognize use cases for data science on Hadoop
    • Describe the Hadoop and YARN architecture
    • Describe supervised and unsupervised learning differences
    • Use Mahout to run a machine learning algorithm on Hadoop
    • Describe the data science life cycle
    • Use Pig to transform and prepare data on Hadoop
    • Write a Python script
    • Describe options for running Python code on a Hadoop cluster
    • Write a Pig User-Defined Function in Python
    • Use Pig streaming on Hadoop with a Python script
    • Use machine learning algorithms
    • Describe use cases for Natural Language Processing (NLP)
    • Use the Natural Language Toolkit (NLTK)
    • Describe the components of a Spark application
    • Write a Spark application in Python
    • Run machine learning algorithms using Spark MLlib
    • Take data science into production
  • Who Should Take This Course

    AUDIENCE

    This class is for architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

    PREREQUISITES

    Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles.

  • Schedule
  • Course Outline
    • Lab: Setting Up a Development Environment
    • Demo: Block Storage
    • Lab: Using HDFS Commands
    • Demo: MapReduce
    • Lab: Using Apache Mahout for Machine Learning
    • Demo: Apache Pig
    • Lab: Getting Started with Apache Pig
    • Lab: Exploring Data with Pig
    • Lab: Using the IPython Notebook
    • Demo: The NumPy Package
    • Demo: The pandas Library
    • Lab: Data Analysis with Python
    • Lab: Interpolating Data Points
    • Lab: Defining a Pig UDF in Python
    • Lab: Streaming Python with Pig
    • Demo: Classification with Scikit-Learn
    • Lab: Computing K-Nearest Neighbor
    • Lab: Generating a K-Means Clustering
    • Lab: POS Tagging Using a Decision Tree
    • Lab: Using NLTK for Natural Language Processing
    • Lab: Classifying Text using Naive Bayes
    • Lab: Using Spark Transformations and Actions
    • Lab Using Spark MLlib
    • Lab: Creating a Spam Classifier with MLlib
  • FAQs
    Is there a discount available for current students?

    UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.

    What is the cancellation and refund policy?

    Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.

Contact Us