• Pune : +91-20-2427 2383 / 2426 4291 / 2426 0308 / 7410 071 951
  • Karad : 02164 - 225500 / 225800

Big Data - Apache Spark

Course Name : Big Data - Apache Spark

Batch Schedule : 23-Mar-2019   To   08-Jun-2019

Duration : 84 hours - 12 Saturdays

Timings : 08:00 AMam   To  4:00 PMpm

Fees : Rs. 15000/- (Incl 18% GST)

  • Students and Freshers.
  • Professionals willing to switch to Big Data / Spark developer stream.
Click to Register
Click to Register
Spark lessons were great and Scala learning experience was really great.
[Spark 01 Batch] Pranjal Dutta (Senior Architect)

 

Excellent modular course & Excellent faculty of sunbeam! Those who want to gain knowledge rather than just information of Apache Spark should go to Sunbeam's Big Data Apache Spark batch.
[Spark 01 Batch] Ashwini P Patil (Project Engineer)

 

Weekend batch is good initiative. Professional can spare time for learning new things. Faculty's way of teaching really helps a lot to understand basic concepts very easily. Looking for more such courses. 
[Hadoop 03 Batch] Sushil Taskar (Software Engineer)

 

Appreciate Sir's tremendous efforts to explain the concepts again and again. The course was very detailed along with lot of hands-on experience. It surely helped me to understand the big data problems and how hadoop MR helps to solve it. When I started the course, I was absolutely naive on this topic but after the training I am confident on hadoop. Thanks a lot Sunbeam for helping associates by imparting such quality trainings.
[Hadoop 03 Batch] Anonymous
Click to Register
  • Core i3 (64-bit) and above
  • RAM Min 8 GB. Recommended: 16 GB+.
  • 64-bit Linux – Ubuntu.
Click to Register
  • Data science (math/stat) - However implementation of stats formulae in Spark job will be covered.
  • Machine Learning - However simple ML program using Spark MLLib will be demonstrated.
  • Hadoop administration - However some basic config and performance related config will be discussed.
  • Spark administration - However some basic config and performance related config will be discussed.
  • Spark cluster on cloud - However multi-node cluster with minimal configuration will be covered.
  • Python3 Programming Language - However for most of examples Python code will be shared.
  • Reporting and visualization tools.
Click to Register
  • Hadoop 2.x
    • Hadoop installation modes
    • Setting up Hadoop cluster
    • HDFS Java API
    • Implementing MR jobs
    • Parsing MR job args
    • Hadoop data types & custom writables
    • Job counters & configuration
    • Input Splits
    • Input/Output formats, Compression
    • Partitioner & Combiners
    • Hadoop Streaming
    • MR Job execution on YARN  
  • Hive
    • Hive introduction, architecture, installation
    • Hive CLI, Security, Beeline, Metastore & Derby
    • Hive managed & external tables,
    • Hive QL: Loading, Filtering, Grouping, Joins
    • Hive simple & complex types, DDL, DML, DQL
    • Hive indexes, views, query optimizations
    • Hive serialization / deserialization, Loading data
    • Partitioning: static & dynamic – use cases
    • Bucketing, use cases of Partitions & Buckets
    • Hive functions, operators and Hive UDF impl.
    • Thrift server, Java/JDBC connectivity  
  • Scala programming language
    • Scala & Intellij IDE Installation
    • Scala Introduction & REPL
    • Data types & Variables
    • Basic programming constructs
    • Functions: Simple, Anonymous, Currying, High Order, Pure, Closures.
    • Collections: Array, Tuple, List, Map, Streams, ...
    • Functional Programming, Lambda expressions, apply(),
    • OOP: Class, Getter/Setter, Constructors, Inheritance, Overriding, abstract, Traits
    • Case classes & Pattern matching
    • Generics, Variances
    • Companion Objects
    • Functors & Monads
    • Implicit class
    • Scala Read file
  • Apache Spark 2
    • Spark concepts
    • Distributed Computing Challenges
    • Spark Architecture & Components
    • Spark Installation & Deployment
    • Setting up Spark cluster
    • Spark Scala Shell
    • PySpark concepts
    • PySpark Shell
    • PySpark installation
    • Executing Spark Scala & Python programs
    • Spark Web UI
    • Spark in Intellij IDE
    • Spark on Databricks cloud
  • Apache Spark 2 - Spark Core
    • Spark RDD, Transformations & Actions, Data Load & Save
    • RDD characteristcus & execution
    • Types of RDD: Key-value, Two Pair, ...
    • Accumulators & Broadcast variables
    • RDD Internals: Distributed/Partitions, Lineage, Persistence
    • Implementing & Submitting Spark Job
    • Execution of Spark Job (RDD)
    • DAG visualization
  • Apache Spark 2 - Spark SQL
    • Spark SQL Introduction
    • Architecture
    • SQLContext & SparkSession
    • Data Frames & Datasets
    • Data Frame Columns & Expressions
    • Implementing & Executing Spark SQL job
    • Interoperating with RDDs
    • User Defined Functions
    • File Formats & Loading data
    • Spark SQL data types & schema
    • Spark SQL functions
    • UDFs & their execution
    • Global/Temporary views
    • Partitioning & Bucketing
    • SQLContext & HiveContext
    • Processing Hive data using Spark SQL
  • Apache Spark 2 - Spark Streaming
    • Streaming concepts
    • Microbatches vs Continuous job
    • Spark Streaming concepts
    • Streaming Context & DStreams
    • Transformations on DStreams
    • Windowing Concept, Windowed Operators:Slice, Window and ReduceByWindow, Stateful Operators
    • Twitter data processing
    • Spark Structured Streaming concepts
    • Triggers, Event time based processing & Watermark
    • Input sources & output sinks
    • Structured Streaming application execution
    • Apache Kafka Introduction
    • Kafka Architecture
    • Kafka Cluster Components & Configuration
    • Kafka Applications
    • Kafka Python client
    • Kafka Spark Source & Sink
  • Apache Spark 2 - Spark ML Introduction
    • Advanced Analytics concepts
    • Advanced Analytics workflow
    • Spark Machine Learning concepts
    • Transformers, Estimators & Models
    • Implement ML model using MLLib
    • Consuming Spark ML model
Click to Register
Sr.No Batch Code Start Date End Date Time
1 Spark02 23-Mar-2019 08-Jun-2019 08:00 AM   To  4:00 PM
Click to Register

Contact us

Pune Centre

SunBeam Institute of Information Technology, Pune

'Sunbeam', Plot No.R/2, Market Yard Road, Behind Hotel Fulora, Gultekdi,    Pune - 411 037. MH-INDIA.

+91-20-2427 2383 / 2426 4291 / 2426 0308 / 7410 071 951
Karad Centre

SunBeam Institute of Information Technology, Karad

'Anuda Chambers', 203 Shaniwar Peth, Near Gujar Hospital, Karad - 415 110,     Dist. Satara, MH-INDIA.

02164 - 225500 / 225800