© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 1
Python for Big Data
Analytics
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 2
Session Objectives
This session will help you to understand:
ᗍ Introduction to Python
ᗍ Web Scraping Use Case
ᗍ Introduction to Big data
ᗍ Getting your doubt’s cleared
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 3
What is Python?
ᗍ Python is a general purpose High-level Programming Language designed to be easy to read and simple
to implement
ᗍ It’s high-level built in Data Structures, combined with dynamic typing and dynamic binding, makes it very
attractive for Rapid Application Development
ᗍ Python supports Modules and Packages, which encourages Program Modularity (feature of subdividing a
program into separate sub-programs) and Code Reuse
ᗍ It is similar to PERL and RUBY but with certain differences such as Object-oriented features
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 4
What is Python? (Cont’d)
Python has Object-oriented Structure. It supports:
Polymorphism
Static
Polymorphism
Runtime
Polymorphism
Class A
Class B Class C
Polymorphism Multiple Inheritance Object Overloading
Operator ‘+’
5+5=10
Skill+Speed
=SkillSpeed
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 5
Why Python?
Good for Text Processing
Generates HTML Content
Your C++ Program
Extended in C and C++
Script.py
Cpython
Interpreter
Cpython
Interpreter
Clear Syntax
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 6
Why Python? (Cont’d)
Interpreted Environment
Source Code
Interpreter
Output
Automatic Memory Management
Good for Code Steering and for
Merging Multiple Programs
Supports Library Utilities and Third Party
Utilities (Example: Numeric, NumPy, SciPy)
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 7
Job Trends
PercentageGrowth
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 8
Users of Python
Google App Engine is an eminent sample of Python-written application, it
allows building web applications with Python programming language, using its
rich collection of libraries, tools and frameworks
YouTube is a big user of Python, the entire site uses Python for different
purposes: view video, control templates for website, administer video, access
to canonical data, and many more. Python is everywhere at YouTube
Amazon Web Services uses Python
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 9
Some More Users of Python
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 10
System Utilities GUIs (Tkinter) Internet Scripting Embedded Scripting
Database Programming Artificial Intelligence Image Processing
Major Uses of Python
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 11
Demo: Web Scraping – Flipkart.com
ᗍ This Example demonstrates how to extract data from flipkart for a particular product like “Watch”
ᗍ We shall use requests (Python Package) which gets the web page for you, then you need to parse the HTML from
the page to retrieve the data. That is done by BeautifulSoup
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 12
ᗍ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process
using on-hand database management tools or traditional data processing applications
ᗍ Huge Amount of Data (Terabytes or Petabytes)
ᗍ The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
ᗍ Many systems or a collection of systems generates these huge data, few examples are Space Exploration,
Deep Sea Navigation, Social Media etc.
What is Big Data?
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 13
Why Big Data?
ᗍ Data being generated today is so huge that traditional systems are unable to process it neither are able to
store it
ᗍ To create better DSS (Decision Support System) system
ᗍ Google alone receives 4 million search queries per minute
ᗍ Data is generated from everywhere such as Sensors for Climate Information, Social Media, Music Audio’s and
Videos, Global Positioning System
ᗍ Only 10 percent of worlds data today resides in RDBMS and 90% elsewhere, how do we deal with this
enormous data?
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 14
Big Data Statistics
Every minute:
ᗍ Facebook users share nearly 2.5 million pieces of content
ᗍ Twitter users tweet nearly 300,000 times
ᗍ Instagram users post nearly 220,000 new photos
ᗍ YouTube users upload 72 hours of new video content
ᗍ Apple users download nearly 50,000 apps
ᗍ Email users send over 200 million messages
ᗍ Amazon generates over $80,000 in online sales
Refer: http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 15
Characteristics of Big Data
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 16
Case Study 1: Big Data from Space
Satellite Imaging
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 17
ᗍ Structure of Data:
Data in social media are unstructured or semi-structured. Data from twitter /Facebook are in JSON. where do we
store it? How do we process it?
ᗍ Quantity of Data:
These are tons of unstructured, structured and semi structured data. How do I derive a pattern out of it?
ᗍ Processing of Data:
How do we process this complex data structure, what technologies do we use?
ᗍ Prediction Algorithm:
After having done all the good work of cleansing and slicing/dicing the data, which algorithm do we use. Is it
decision tree, SVM, k-mean, kNN and the list goes on
Case Study 2: Social Media Analytics
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 18
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 19
Course Topics
Module 1
Introduction to Python
Module 2
Built-In Data Types, Strings,
Sequence and Files
Module 3
Functions, Sorting, Exceptions,
Standard Libraries
Module 4
Regular Expression and
Object-oriented Programming
Module 5
Debugging Python, Project
Skeleton in Python and SQLite
Database
Module 6
Introduction to Big Data and
Hadoop
Module 7
Python and Big Data
Module 8
Implementation of Machine
Learning in Python
Module 9
Working Examples of
Machine Learning in Python
Module 10
Project Implementation
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 20
Corporate Partners
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 21
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact Us
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 22
References
https://harshbhimjyani.wordpress.com/2014/10/21/scraping-flipkart/
https://www.vlab.org/sandbox/events/satellite-imaging-big-data-from-space-shared/
http://www.datasciencecentral.com/profiles/blogs/data-veracity
Python and BIG Data analytics | Python Fundamentals | Python Architecture

Python and BIG Data analytics | Python Fundamentals | Python Architecture

  • 1.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 1 Python for Big Data Analytics
  • 2.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 2 Session Objectives This session will help you to understand: ᗍ Introduction to Python ᗍ Web Scraping Use Case ᗍ Introduction to Big data ᗍ Getting your doubt’s cleared
  • 3.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 3 What is Python? ᗍ Python is a general purpose High-level Programming Language designed to be easy to read and simple to implement ᗍ It’s high-level built in Data Structures, combined with dynamic typing and dynamic binding, makes it very attractive for Rapid Application Development ᗍ Python supports Modules and Packages, which encourages Program Modularity (feature of subdividing a program into separate sub-programs) and Code Reuse ᗍ It is similar to PERL and RUBY but with certain differences such as Object-oriented features
  • 4.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 4 What is Python? (Cont’d) Python has Object-oriented Structure. It supports: Polymorphism Static Polymorphism Runtime Polymorphism Class A Class B Class C Polymorphism Multiple Inheritance Object Overloading Operator ‘+’ 5+5=10 Skill+Speed =SkillSpeed Get Started with Python
  • 5.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 5 Why Python? Good for Text Processing Generates HTML Content Your C++ Program Extended in C and C++ Script.py Cpython Interpreter Cpython Interpreter Clear Syntax Get Started with Python
  • 6.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 6 Why Python? (Cont’d) Interpreted Environment Source Code Interpreter Output Automatic Memory Management Good for Code Steering and for Merging Multiple Programs Supports Library Utilities and Third Party Utilities (Example: Numeric, NumPy, SciPy) Get Started with Python
  • 7.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 7 Job Trends PercentageGrowth Get Started with Python
  • 8.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 8 Users of Python Google App Engine is an eminent sample of Python-written application, it allows building web applications with Python programming language, using its rich collection of libraries, tools and frameworks YouTube is a big user of Python, the entire site uses Python for different purposes: view video, control templates for website, administer video, access to canonical data, and many more. Python is everywhere at YouTube Amazon Web Services uses Python Get Started with Python
  • 9.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 9 Some More Users of Python Get Started with Python
  • 10.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 10 System Utilities GUIs (Tkinter) Internet Scripting Embedded Scripting Database Programming Artificial Intelligence Image Processing Major Uses of Python Get Started with Python
  • 11.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 11 Demo: Web Scraping – Flipkart.com ᗍ This Example demonstrates how to extract data from flipkart for a particular product like “Watch” ᗍ We shall use requests (Python Package) which gets the web page for you, then you need to parse the HTML from the page to retrieve the data. That is done by BeautifulSoup Get Started with Python
  • 12.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 12 ᗍ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ᗍ Huge Amount of Data (Terabytes or Petabytes) ᗍ The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization ᗍ Many systems or a collection of systems generates these huge data, few examples are Space Exploration, Deep Sea Navigation, Social Media etc. What is Big Data? Get Started with Python
  • 13.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 13 Why Big Data? ᗍ Data being generated today is so huge that traditional systems are unable to process it neither are able to store it ᗍ To create better DSS (Decision Support System) system ᗍ Google alone receives 4 million search queries per minute ᗍ Data is generated from everywhere such as Sensors for Climate Information, Social Media, Music Audio’s and Videos, Global Positioning System ᗍ Only 10 percent of worlds data today resides in RDBMS and 90% elsewhere, how do we deal with this enormous data? Get Started with Python
  • 14.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 14 Big Data Statistics Every minute: ᗍ Facebook users share nearly 2.5 million pieces of content ᗍ Twitter users tweet nearly 300,000 times ᗍ Instagram users post nearly 220,000 new photos ᗍ YouTube users upload 72 hours of new video content ᗍ Apple users download nearly 50,000 apps ᗍ Email users send over 200 million messages ᗍ Amazon generates over $80,000 in online sales Refer: http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/ Get Started with Python
  • 15.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 15 Characteristics of Big Data Get Started with Python
  • 16.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 16 Case Study 1: Big Data from Space Satellite Imaging Get Started with Python
  • 17.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 17 ᗍ Structure of Data: Data in social media are unstructured or semi-structured. Data from twitter /Facebook are in JSON. where do we store it? How do we process it? ᗍ Quantity of Data: These are tons of unstructured, structured and semi structured data. How do I derive a pattern out of it? ᗍ Processing of Data: How do we process this complex data structure, what technologies do we use? ᗍ Prediction Algorithm: After having done all the good work of cleansing and slicing/dicing the data, which algorithm do we use. Is it decision tree, SVM, k-mean, kNN and the list goes on Case Study 2: Social Media Analytics Get Started with Python
  • 18.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 18 Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with Python
  • 19.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 19 Course Topics Module 1 Introduction to Python Module 2 Built-In Data Types, Strings, Sequence and Files Module 3 Functions, Sorting, Exceptions, Standard Libraries Module 4 Regular Expression and Object-oriented Programming Module 5 Debugging Python, Project Skeleton in Python and SQLite Database Module 6 Introduction to Big Data and Hadoop Module 7 Python and Big Data Module 8 Implementation of Machine Learning in Python Module 9 Working Examples of Machine Learning in Python Module 10 Project Implementation Get Started with Python
  • 20.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 20 Corporate Partners Get Started with Python
  • 21.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 21 Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact Us Get Started with Python
  • 22.
    © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Slide 22 References https://harshbhimjyani.wordpress.com/2014/10/21/scraping-flipkart/ https://www.vlab.org/sandbox/events/satellite-imaging-big-data-from-space-shared/ http://www.datasciencecentral.com/profiles/blogs/data-veracity

Editor's Notes

  • #5 Why use Python :- Python is object-oriented Structure supports such concepts as polymorphism, operation overloading, and multiple inheritance It's free (open source) Downloading and installing Python is free and easy Source code is easily accessible Free doesn't mean unsupported! Online Python community is huge It's portable Python runs virtually every major platform used today As long as you have a compatible Python interpreter installed, Python programs will run in exactly the same manner, irrespective of platform It's powerful Dynamic typing Built-in types and tools Library utilities Third party utilities (e.g. Numeric, NumPy, SciPy) Automatic memory management It's mixable Python can be linked to components written in other languages easily Linking to fast, compiled code is useful to computationally intensive problems Python is good for code steering and for merging multiple programs in otherwise conflicting languages Python/C integration is quite common WARP is implemented in a mixture of Python and Fortran It's easy to use Rapid turnaround: no intermediate compile and link steps as in C or C++ Python programs are compiled automatically to an intermediate form called bytecode, which the interpreter then reads This gives Python the development speed of an interpreter without the performance loss inherent in purely interpreted languages It's easy to learn Structure and syntax are pretty intuitive and easy to grasp
  • #16 Image copied from : http://www.datasciencecentral.com/profiles/blogs/data-veracity
  • #17 http://www.framingmymessage.nl/wp-content/uploads/2013/10/Social-media.jpg
  • #19 SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology. This industry expert designs, develops, and delivers the course. SkillSpeed provides you: Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Real life industry case studies  - Live Virtual Interactions Interaction with industry experts  - Lifetime access to all course content via the LMS   - 24*7 support   - 100% placement assistance