Course Name :

Batch Schedule : 02-Nov-2019 To 11-Jan-2020

Timings : 8:00 AM To 2:00 PM

Target Audience

Students and Freshers.
Professionals willing to switch to Big Data / Spark developer stream.

Click to Register

Prerequisites

Linux commands familiarity
Any RDBMS (like Oracle or MySQL)
Python3 programming skills
Java programming awareness (for Hadoop MR demos)
XML awareness

Click to Register

Laptop configuration for installing Spark

Core i3 (64-bit) and above
RAM Min 8 GB. Recommended: 16 GB+.
64-bit Linux – Ubuntu.

Click to Register

Course doesn't contain / include

Data science (math/stat) - However implementation of stats formulae in Spark job will be covered.
Machine Learning - However simple ML program using Spark MLLib will be demonstrated.
Hadoop administration - However some basic config and performance related config will be discussed.
Spark administration - However some basic config and performance related config will be discussed.
Spark cluster on cloud - However multi-node cluster with minimal configuration will be covered.
Python3 Programming Language - However for Spark programming will be done in Python3.
Reporting and visualization tools.

Click to Register

Syllabus of Big Data - Spark Developer

Hadoop 2.x
- Hadoop installation modes
- Setting up Hadoop cluster
- HDFS Java API
- Implementing MR jobs
- Parsing MR job args
- Hadoop data types & custom writables
- Job counters & configuration
- Input Splits
- Input/Output formats, Compression
- Partitioner & Combiners
- Hadoop Streaming
- MR Job execution on YARN
Hive
- Hive introduction, architecture, installation
- Hive CLI, Security, Beeline, Metastore & Derby
- Hive managed & external tables,
- Hive QL: Loading, Filtering, Grouping, Joins
- Hive simple & complex types, DDL, DML, DQL
- Hive indexes, views, query optimizations
- Hive serialization / deserialization, Loading data
- Partitioning: static & dynamic – use cases
- Bucketing, use cases of Partitions & Buckets
- Hive functions, operators and Hive UDF impl.
- Thrift server, Java/JDBC connectivity
Apache Spark 2
- Spark concepts
- Distributed Computing Challenges
- Spark Architecture & Components
- Spark Installation & Deployment
- Setting up Spark cluster
- PySpark concepts
- PySpark Shell
- PySpark installation
- Executing Spark Python programs
- Spark Web UI
- Spark in Pycharm IDE
- Spark on Databricks cloud

Apache Spark 2 - Spark Core
- Spark RDD, Transformations & Actions, Data Load & Save
- RDD characteristcus & execution
- Types of RDD: Key-value, Two Pair, ...
- Accumulators & Broadcast variables
- RDD Internals: Distributed/Partitions, Lineage, Persistence
- Implementing & Submitting Spark Job
- Execution of Spark Job (RDD)
- DAG visualization

Apache Spark 2 - Spark SQL
- Spark SQL Introduction
- Architecture
- SQLContext & SparkSession
- Data Frames & Datasets
- Data Frame Columns & Expressions
- Implementing & Executing Spark SQL job
- Interoperating with RDDs
- User Defined Functions
- File Formats & Loading data
- Spark SQL data types & schema
- Spark SQL functions
- UDFs & their execution
- Global/Temporary views
- Partitioning & Bucketing
- SQLContext & HiveContext
- Processing Hive data using Spark SQL

Apache Spark 2 - Spark Streaming
- Streaming concepts
- Microbatches vs Continuous job
- Spark Streaming concepts
- Streaming Context & DStreams
- Transformations on DStreams
- Windowing Concept, Windowed Operators:Slice, Window and ReduceByWindow, Stateful Operators
- Twitter data processing
- Spark Structured Streaming concepts
- Triggers, Event time based processing & Watermark
- Input sources & output sinks
- Structured Streaming application execution
- Apache Kafka Introduction
- Kafka Architecture
- Kafka Cluster Components & Configuration
- Kafka Applications
- Kafka Python client
- Kafka Spark Source & Sink

Apache Spark 2 - Spark ML Introduction
- Advanced Analytics concepts
- Advanced Analytics workflow
- Spark Machine Learning concepts
- Transformers, Estimators & Models
- Implement ML model using MLLib
- Consuming Spark ML model

Click to Register

Student Feedback:

Nilesh sir taught us very well. I'm very lucky to be a student of Nilesh Sir. Thankful for that. Sir, please provide some more project ideas and some more assignments also.

This course almost had very good coverage of technologies used in big data engineering in addition to Spark. I look forward for similar weekend classes.

Click to Register

Batch schedule

Sr.No	Batch Code	Start Date	End Date	Time
1	Spark03	02-Nov-2019	11-Jan-2020	8:00 AM To 2:00 PM

Click to Register

Registration

&

Online Admission

Contact us

Sunbeam Hinjawadi Pune

Authorized Training Centre of C-DAC ACTS

"Sunbeam IT Park", Ground Floor, Phase 2 of Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411057, MH-INDIA

+91 82 82 82 9805 / +91 82 82 82 9806

siit@sunbeaminfo.com

Sunbeam Karad

Authorized Training Centre of C-DAC ACTS

'Anuda Chambers', 203 Shaniwar Peth, Near Gujar Hospital, Karad - 415 110, Dist. Satara, MH-INDIA.

+91 82 82 82 9806 / 02164 - 225500 , 225800

siitkarad@sunbeaminfo.com

Courses

Course Name :

Target Audience

Prerequisites

Laptop configuration for installing Spark

Course doesn't contain / include

Syllabus of Big Data - Spark Developer

Student Feedback:

Batch schedule

Registration

&

Online Admission

Contact us

Sunbeam Hinjawadi Pune

Sunbeam Karad