© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
2
Algorithms
The Brains
– Introduction to Data Science
– Data Munging & Fusion
– Text Mining
• Naïve Bayes
– Recommendation Engines
– Principal Component Analysis
– Classification
• Decision Trees
• Random Forest
• Gradient Boosting Machines
– Generalized Linear Models
– Clustering
• KNN
• K-Means
– Graph Theory
– Stable Marriage
Hadoop
Big Data
Core
Engineering
Our Training Offerings
Skills you need
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
3
Training Overview
Evening Classes
Big DataBig Data Track1Big Data
Big Data Track 2Machine Learning
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
4
Big Data Training
4 week intensive big data Evening Classes
Week 1 Week 2 Week 3 Week 4 Self Study
Certifications
Complete the industry
standard Hadoop certification
For Data Science
Track1
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
5
Machine Learning Training
6 Week Data Science Evening Classes
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
Introduction to
Machine Learning
Recommendation Engines
Collaborative Filtering
Gradient Boosting
Machines
For Data Science
Data Fusion
and Fuzzy Matching
Principal Component
Analysis
Graph Theory
and Stable Marriage
Generalized Linear Models
Linear Regression
Regularization
Logistic Regression
Decision Trees
Text Mining
Naive Bayes
Random Forests
Clustering
Knn
K-Means
Data Aggregation Project Data Science Project Career Counseling
Track2
Big DataBig Data Track1Big Data
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
7
Big Data Training
4 week intensive big data training
Week 1 Week 2 Week 3 Week 4 Self Study
Certifications
Complete the industry
standard Hadoop certification
For Data Science
Track1
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
8
Week 1
Introductions
• Motivation for Big Data
• Unix for Data Science
• Pushing and Pulling data from remote servers
• Columnar Compressions
• Extended Data Dictionary
Monday - 6:30 PM Wednesday - 6:30 PM
Pulling and Processing Data
• SQL overview
• SQL design patterns for data analytics
o Pivot Tables
o Aggregation
o Network Analysis
Unix Assignments
• Process data in parallel
• Working with remote Machines
SQL Assignments
Big Data Training
Master the basics
• Five key design patterns
• Joins, Aggregation, Temp Tables,
Indexes, Functions
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
9
Cluster Setup
• Introduction to Big Data Ecosystem
• Acquire 5 machines in AWS
• Prepare machines for Hadoop
• Setup 5 – 10 Node Cluster
• Say Hello to Hadoop
Monday - 6:30 PM Wednesday - 6:30 PM
Introduction Hadoop
• Motivation for Hadoop
• HDFS
• ETL in Hadoop with large dataset
• SQOOP
• OOZIE
• Hadoop Streaming
Cluster Setup Assignment
• Setup Cluster in cloud
• Develop automation scripts
ETL In Hadoop
Big Data Training
Spin up the cluster
• N Gram data in Hadoop
• Develop ETL jobs in cluster
Week 2
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
10
Hive
• Motivation for hive
• Hive architecture
• Aggregation and data selection
• Hive and Python Integration
Monday - 6:30 PM Wednesday - 6:30 PM
Advanced Hive
• Hive Jobs and Variables
• Custom Functions
• Custom data types
• Indexing and Performance issues
Hive Assignment
• Data aggregation
Hive Assignment 2
Big Data Training
Wrangle millions of records in Hadoop
• N Gram data in Hadoop
• Develop ETL jobs in cluster
Week 3
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
11
Hadoop Map Reduce
• Motivation for Map Reduce
• Map Reduce in action
• Map Reduce API
• Splitter and Combiners
• Custom data format
Monday - 6:30 PM Wednesday - 6:30 PM
Advanced Map Reduce
• Distributed Joins
• Data Compression in Map Reduce
• Optimizations
• Debugging and Tracing
M/R Assignment
• Data aggregation
• Extended Data Dictionaries
M/R Assignment 2
Big Data Training
Hadoop under the hood with Map Reduce
• N Gram data in Hadoop
• Develop ETL jobs in cluster
Week 4
Big Data Track 2Machine Learning
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
13
Machine Learning Training
6 Week Data Science Evening Classes
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
Introduction to
Machine Learning
Recommendation Engines
Collaborative Filtering
Gradient Boosting
Machines
For Data Science
Data Fusion
and Fuzzy Matching
Principal Component
Analysis
Graph Theory
and Stable Marriage
Generalized Linear Models
Linear Regression
Regularization
Logistic Regression
Decision Trees
Text Mining
Naive Bayes
Random Forests
Clustering
Knn
K-Means
Data Aggregation Project Data Science Project Career Counseling
Track2
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
14
Week 1
Introduction to ML & Unix
• Motivation for Machine Learning (ML)
• Geometric , Probabilistic and Logical Models
• Standardized ML Model lifecycle
• Unix for Data Science
• Pushing and Pulling data from remote servers
• Extended Data Dictionary
Tuesday - 6:30 PM Thursday - 6:30 PM
Python for Data Science
• Thinking in Python
• Python design patterns for data analytics
• Pandas
• Data Frames
• Aggregations
• Scripting in Python
1. Unix Assignments
• Data Processing in UNIS
• Data Processing in parallel
• Working with remote machines
2. SQL & Python Assignments
Machine Learning
• Data Processing in Python
• Data Processing in SQL
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
15
Tuesday - 6:30 PM Thursday - 6:30 PM
3. Titanic Survivors
• Who is most likely to survive the Titanic
disaster?
4. Classify recipes by ingredien
Machine Learning
• Analyzing ingredients to identify the
origin of cuisine
• Data munging
Week 2
Decision Trees
• Motivation for Decision Trees
• ID3, C4.5 and CART
• Entropy, Information Gain
• Pruning and Purging
• Trees in Actions
Text Mining / Naïve Bayes
• Motivation for Text Mining
• Working with unstructured datasets
• Tokenization and Standardization of text
• Naïve Bayes
• Applications and Results
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
16
Recommendation Engine
• Motivation for recommendation Engines
• Sparse Matrices operations
• Manhattan Distance, Euclidean Distance,
Cosine Distance
• Similarity Matrices and results
Tuesday - 6:30 PM Thursday - 6:30 PM
5. Predict Customer Churn
• Data Munging
• Telecom customer churn model development
• Validate the model
6. Collaborative Filter
Machine Learning
• Identify similar analytical topics based
on hacker news feed
Week 3
Random Forest
• Motivation for Random Forest
• Vote by democracy / Variable Importance
• Random Forest in Action
• Industry Use Cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
17
Tuesday - 6:30 PM Thursday - 6:30 PM
Principal Component Analysis
• Motivation for Principal Component Analysis
• Curse of dimensionality
• Best Practices for dimensionality reduction
• Use cases and applications
7. GBM Assignment
• Data Munging
• Telecom customer churn model development
• Compare Random Forest and GBM
8. Image processing
Machine Learning
• Reduce dimensions in image data
• Classify images in categories
Week 4
Gradient Boosting Machines (GBM)
• Motivation for GBM
• Boosting vs. Bagging
• Residual error and tree generations
• Metrics Search for best GBM Trees
• GBM in action
• Industry Use cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
18
Tuesday - 6:30 PM Thursday - 6:30 PM
9. Regression models
• Predict housing pricing
• Data munging
10. Clustering
Machine Learning
• Clustering around flower types
Week 5
Generalized Linear Models
• Linear Regression
• Regularization ( Ridge, Lasso )
• Logistic Regression
• Generalized Linear Models
• Feature Selections
• Industry Use Case
Clustering : Knn & K-means
• Motivation for Un-supervised learning methods
• Intuition behind Knn and Applications
• Intuition behind K-Means and Applications
• Multi class classification
• Hierarchical Clustering
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
19
Graph Theory and Stable Marriage
• Master key graph theory metrics
• Bi-partite graphs
• Visualizing graph with Gephi
• Motivations for matching algorithms with preferences
• Preferences with both parties
• Incomplete List and Ties
• Industry Use cases
Tuesday - 6:30 PM Thursday - 6:30 PM
11. Data Fusion
• Fuzzy matching on Names and Address
• Data Munging
12. Graph Theory, Stable Marriage
Machine Learning
• Determine stable pairs between two
groups based on preferences
Week 6
Data Fusion and Fuzzy Matching
• Merging data sets from multiple sources
• Probabilistic and Deterministic Matching
• String Fuzzy Matching
- Edit Distances, Jaro Winkler Distance
• Fuzzy Address Matching
• Swap-in / Swap-out analysis
• Industry Use Cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
20
Enroll@bitbootcamp.com 917-819-0106 www.bitbootcamp.com
25 Broadway
Suite 1032
New York, NY
Contact Us
Made in NYC

BitBootCamp Evening Classes

  • 2.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 2 Algorithms The Brains – Introduction to Data Science – Data Munging & Fusion – Text Mining • Naïve Bayes – Recommendation Engines – Principal Component Analysis – Classification • Decision Trees • Random Forest • Gradient Boosting Machines – Generalized Linear Models – Clustering • KNN • K-Means – Graph Theory – Stable Marriage Hadoop Big Data Core Engineering Our Training Offerings Skills you need
  • 3.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 3 Training Overview Evening Classes Big DataBig Data Track1Big Data Big Data Track 2Machine Learning
  • 4.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 4 Big Data Training 4 week intensive big data Evening Classes Week 1 Week 2 Week 3 Week 4 Self Study Certifications Complete the industry standard Hadoop certification For Data Science Track1
  • 5.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 5 Machine Learning Training 6 Week Data Science Evening Classes Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Introduction to Machine Learning Recommendation Engines Collaborative Filtering Gradient Boosting Machines For Data Science Data Fusion and Fuzzy Matching Principal Component Analysis Graph Theory and Stable Marriage Generalized Linear Models Linear Regression Regularization Logistic Regression Decision Trees Text Mining Naive Bayes Random Forests Clustering Knn K-Means Data Aggregation Project Data Science Project Career Counseling Track2
  • 6.
    Big DataBig DataTrack1Big Data
  • 7.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 7 Big Data Training 4 week intensive big data training Week 1 Week 2 Week 3 Week 4 Self Study Certifications Complete the industry standard Hadoop certification For Data Science Track1
  • 8.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 8 Week 1 Introductions • Motivation for Big Data • Unix for Data Science • Pushing and Pulling data from remote servers • Columnar Compressions • Extended Data Dictionary Monday - 6:30 PM Wednesday - 6:30 PM Pulling and Processing Data • SQL overview • SQL design patterns for data analytics o Pivot Tables o Aggregation o Network Analysis Unix Assignments • Process data in parallel • Working with remote Machines SQL Assignments Big Data Training Master the basics • Five key design patterns • Joins, Aggregation, Temp Tables, Indexes, Functions
  • 9.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 9 Cluster Setup • Introduction to Big Data Ecosystem • Acquire 5 machines in AWS • Prepare machines for Hadoop • Setup 5 – 10 Node Cluster • Say Hello to Hadoop Monday - 6:30 PM Wednesday - 6:30 PM Introduction Hadoop • Motivation for Hadoop • HDFS • ETL in Hadoop with large dataset • SQOOP • OOZIE • Hadoop Streaming Cluster Setup Assignment • Setup Cluster in cloud • Develop automation scripts ETL In Hadoop Big Data Training Spin up the cluster • N Gram data in Hadoop • Develop ETL jobs in cluster Week 2
  • 10.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 10 Hive • Motivation for hive • Hive architecture • Aggregation and data selection • Hive and Python Integration Monday - 6:30 PM Wednesday - 6:30 PM Advanced Hive • Hive Jobs and Variables • Custom Functions • Custom data types • Indexing and Performance issues Hive Assignment • Data aggregation Hive Assignment 2 Big Data Training Wrangle millions of records in Hadoop • N Gram data in Hadoop • Develop ETL jobs in cluster Week 3
  • 11.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 11 Hadoop Map Reduce • Motivation for Map Reduce • Map Reduce in action • Map Reduce API • Splitter and Combiners • Custom data format Monday - 6:30 PM Wednesday - 6:30 PM Advanced Map Reduce • Distributed Joins • Data Compression in Map Reduce • Optimizations • Debugging and Tracing M/R Assignment • Data aggregation • Extended Data Dictionaries M/R Assignment 2 Big Data Training Hadoop under the hood with Map Reduce • N Gram data in Hadoop • Develop ETL jobs in cluster Week 4
  • 12.
    Big Data Track2Machine Learning
  • 13.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 13 Machine Learning Training 6 Week Data Science Evening Classes Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Introduction to Machine Learning Recommendation Engines Collaborative Filtering Gradient Boosting Machines For Data Science Data Fusion and Fuzzy Matching Principal Component Analysis Graph Theory and Stable Marriage Generalized Linear Models Linear Regression Regularization Logistic Regression Decision Trees Text Mining Naive Bayes Random Forests Clustering Knn K-Means Data Aggregation Project Data Science Project Career Counseling Track2
  • 14.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 14 Week 1 Introduction to ML & Unix • Motivation for Machine Learning (ML) • Geometric , Probabilistic and Logical Models • Standardized ML Model lifecycle • Unix for Data Science • Pushing and Pulling data from remote servers • Extended Data Dictionary Tuesday - 6:30 PM Thursday - 6:30 PM Python for Data Science • Thinking in Python • Python design patterns for data analytics • Pandas • Data Frames • Aggregations • Scripting in Python 1. Unix Assignments • Data Processing in UNIS • Data Processing in parallel • Working with remote machines 2. SQL & Python Assignments Machine Learning • Data Processing in Python • Data Processing in SQL
  • 15.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 15 Tuesday - 6:30 PM Thursday - 6:30 PM 3. Titanic Survivors • Who is most likely to survive the Titanic disaster? 4. Classify recipes by ingredien Machine Learning • Analyzing ingredients to identify the origin of cuisine • Data munging Week 2 Decision Trees • Motivation for Decision Trees • ID3, C4.5 and CART • Entropy, Information Gain • Pruning and Purging • Trees in Actions Text Mining / Naïve Bayes • Motivation for Text Mining • Working with unstructured datasets • Tokenization and Standardization of text • Naïve Bayes • Applications and Results
  • 16.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 16 Recommendation Engine • Motivation for recommendation Engines • Sparse Matrices operations • Manhattan Distance, Euclidean Distance, Cosine Distance • Similarity Matrices and results Tuesday - 6:30 PM Thursday - 6:30 PM 5. Predict Customer Churn • Data Munging • Telecom customer churn model development • Validate the model 6. Collaborative Filter Machine Learning • Identify similar analytical topics based on hacker news feed Week 3 Random Forest • Motivation for Random Forest • Vote by democracy / Variable Importance • Random Forest in Action • Industry Use Cases
  • 17.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 17 Tuesday - 6:30 PM Thursday - 6:30 PM Principal Component Analysis • Motivation for Principal Component Analysis • Curse of dimensionality • Best Practices for dimensionality reduction • Use cases and applications 7. GBM Assignment • Data Munging • Telecom customer churn model development • Compare Random Forest and GBM 8. Image processing Machine Learning • Reduce dimensions in image data • Classify images in categories Week 4 Gradient Boosting Machines (GBM) • Motivation for GBM • Boosting vs. Bagging • Residual error and tree generations • Metrics Search for best GBM Trees • GBM in action • Industry Use cases
  • 18.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 18 Tuesday - 6:30 PM Thursday - 6:30 PM 9. Regression models • Predict housing pricing • Data munging 10. Clustering Machine Learning • Clustering around flower types Week 5 Generalized Linear Models • Linear Regression • Regularization ( Ridge, Lasso ) • Logistic Regression • Generalized Linear Models • Feature Selections • Industry Use Case Clustering : Knn & K-means • Motivation for Un-supervised learning methods • Intuition behind Knn and Applications • Intuition behind K-Means and Applications • Multi class classification • Hierarchical Clustering
  • 19.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 19 Graph Theory and Stable Marriage • Master key graph theory metrics • Bi-partite graphs • Visualizing graph with Gephi • Motivations for matching algorithms with preferences • Preferences with both parties • Incomplete List and Ties • Industry Use cases Tuesday - 6:30 PM Thursday - 6:30 PM 11. Data Fusion • Fuzzy matching on Names and Address • Data Munging 12. Graph Theory, Stable Marriage Machine Learning • Determine stable pairs between two groups based on preferences Week 6 Data Fusion and Fuzzy Matching • Merging data sets from multiple sources • Probabilistic and Deterministic Matching • String Fuzzy Matching - Edit Distances, Jaro Winkler Distance • Fuzzy Address Matching • Swap-in / Swap-out analysis • Industry Use Cases
  • 20.
    © 2015 HudsonData Corp. All Rights Reserved. www.bitbootcamp.com 20 Enroll@bitbootcamp.com 917-819-0106 www.bitbootcamp.com 25 Broadway Suite 1032 New York, NY Contact Us Made in NYC