We're offering 20% off September Live Online classes! See which courses are applicable.   |   Details >

AccountIcon BigDataIcon BlogIcon default_resource_icon CartIcon checkmark_icon cloud_devops_icon computer_network_admin_icon cyber_security_icon gsa_schedule_icon human_resources_icon location_icon phone_icon plus_icon programming_software_icon project_management_icon redhat_linux_icon search_icon sonography_icon sql_database_icon webinar_icon

Search UMBC Training Centers

Big Data Analytics

Hortonworks HDP Developer: Enterprise Spark I

+ View more dates & times
  • Overview

    This course is designed as an entry point for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. Topics include: An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN; using Spark Core APIs for interactive data exploration; Spark SQL and DataFrame operations; Spark Streaming and DStream operations; data visualization, reporting, and collaboration; performance monitoring and tuning; building and deploying Spark applications; and an introduction to the Spark Machine Learning Library.

    Upon completion of this course, students will be able to:

    • Describe Hadoop, HDFS, YARN, and the HDP ecosystem
    • Describe Spark use cases
    • Explore and manipulate data using Zeppelin
    • Explore and manipulate data using a Spark REPL
    •  Explain the purpose and function of RDDs
    • Employ functional programming practices
    • Perform Spark transformations and actions
    • Work with Pair RDDs
    • Perform Spark queries using Spark SQL and DataFrames
    • Use Spark Streaming stateless and window transformations
    • Visualize data, generate reports, and collaborate using Zeppelin
    • Monitor Spark applications using Spark History Server
    • Learn general application optimization guidelines/tips
    • Use data caching to increase performance of applications
    • Build and package Spark applications
    • Deploy applications to the cluster using YARN
    • Understand the purpose of Spark MLlib
  • Who Should Take This Course


    Students should be familiar with programming principles and have previous experience in software development using either Python or Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.


    Students should be familiar with programming principles and have previous experience in software development. Experience with Linux and a basic understanding of DataFlow tools would be helpful. No prior Hadoop experience required, but is very helpful.

  • Schedule
  • Course Outline

    Hands-on Labs
    Labs can be performed using either Python or Scala
    • Use common HDFS commands
    • Use a REPL to program in Spark
    • Use Zeppelin to program in Spark
    • Perform RDD transformations and actions
    • Perform Pair RDD transformations and actions
    • Utilize Spark SQL
    • Perform stateless transformations using Spark Streaming
    • Perform window-based transformations
    • Use Zeppelin for data visualization and reporting
    • Monitor applications using Spark History Server
    • Cache and persist data
    • Configure checkpointing, broadcast variables, and executors
    • Build and submit a Spark application to YARN
    • Run Spark MLlib applications

  • FAQs
    Is there a discount available for current students?

    UMBC students and alumni, as well as students who have previously taken a public training course with UMBC Training Centers are eligible for a 10% discount, capped at $250. Please provide a copy of your UMBC student ID or an unofficial transcript or the name of the UMBC Training Centers course you have completed. Online courses are excluded from this offer.

    What is the cancellation and refund policy?

    Student will receive a refund of paid registration fees only if UMBC Training Centers receives a notice of cancellation at least 10 business days prior to the class start date for classes or the exam date for exams.

Contact Us