SlideShare a Scribd company logo
DATA SCIENCE
TOPICS
 databases and data architectures
 databases in the real world
 scaling, data quality, distributed
 machine learning/data mining/statistics
 information retrieval
 Data Science is currently a popular interest of
employers
 our Industrial Affiliates Partners say there is high
demand for students trained in Data Science
 databases, warehousing, data architectures
 data analytics – statistics, machine learning
 Big Data – gigabytes/day or more
 Examples:
 Walmart, cable companies (ads linked to content, viewer
trends), airlines/Orbitz, HMOs, call centers, Twitter
(500M tweets/day), traffic surveillance cameras,
detecting fraud, identity theft...
 supports “Business Intelligence”
 quantitative decision-making and control
 finance, inventory, pricing/marketing, advertising
 need data for identifying risks, opportunities, conducting
“what-if” analyses
DATA ARCHITECTURES
 traditional databases (CSCE 310/608)
 tables, fields
 tuples = records or rows
 <yellowstone,WY,6000000 acres,geysers>
 key = field with unique values
 can be used as a reference from one table into another
 important for avoiding redundancy (normalization), which risks
inconsistency
 join – combining 2 tables using a key
 metadata – data about the data
 names of the fields, types (string, int, real, mpeg...)
 also things like source, date, size, completeness/sampling
Name HomeTown Grad school PhD teaches title
John Flaherty Houston, TX Rice 2005 CSCE 411 Design and Analysis of Algorithms
Susan Jenkins Omaha, NE Univ of Michigan 2004 CSCE 121 Introduction to Computing in C++
Susan Jenkins Omaha, NE Univ of Michigan 2004 CSCE 206 Programming in C
Bill Jones Pittsburgh, PA Carnegie Mellon 1999 CSCE 314 Programming Languages
Bill Jones Pittsburgh, PA Carnegie Mellon 1999 CSCE 206 Programming in C
Name teaches
John Flaherty CSCE 411
Susan Jenkins CSCE 121
Susan Jenkins CSCE 206
Bill Jones CSCE 314
Bill Jones CSCE 206
course title
CSCE 411 Design and Analysis of Algorithms
CSCE 121 Introduction to Computing in C++
CSCE 314 Programming Languages
CSCE 206 Programming in C
Name HomeTown Grad school PhD
John Flaherty Houston, TX Rice 2005
Susan Jenkins Omaha, NE Univ of Michigan 2004
Bill Jones Pittsburgh, PA Carnegie Mellon 1999
Instructors:
TeachingAssignments:
Courses:
 SQL: Structured Query Language
>SELECT Name,HomeTown FROM Instructors WHERE PhD<2000;
Bill Jones Pittsburgh, PA
>SELECT Course,Title FROM Courses ORDER BY Course;
CSCE 121 Introduction to Computing in C++
CSCE 206 Programming in C
CSCE 314 Programming Languages
CSCE 411 Design and Analysis of Algorithms
can also compute sums, counts, means, etc.
example of JOIN: find all courses taught by someone from CMU:
>SELECT TeachingAssignments.Course
FROM Instructors JOIN TeachingAssignments
ON Instructors.Name=TeachingAssigmnents.Name
WHERE Instructor.PhD=“Carnegie Mellon”
CSCE 314
CSCE 206
because they were both taught by Bill Jones
 SQL servers
 centralized database, required for concurrent
access by multiple users
 ODBC: Open DataBase Connectivity – protocol to
connect to servers and do queries, updates from
languages like Java, C, Python
 Oracle, IBM DB2 - industrial strength SQL
databases
 some efficiency issues with real databases
 indexing
 how to efficiently find all songs written by Paul Simon in a
database with 10,000,000 entries?
 data structures for representing sorted order on fields
 disk management
 databases are often too big to fit in RAM, leave most of it on
disk and swap in blocks of records as needed – could be slow
 concurrency
 transaction semantics: either all updates happen en batch or
none (commit or rollback)
 like delete one record and simultaneously add another but
guarantee not to leave in an inconsistent state
 other users might be blocked till done
 query optimization
 the order in which you JOIN tables can drastically affect the size
of the intermediate tables
 Unstructured data
 raw text
 documents, digital libraries
 grep, substring indexing, regular expressions
 like find all instances of “[aA]g+ies” including “agggggies”
 Information Retrieval (CSCE 470)
 look for synonyms, similar words (like “car” and “auto”)
 tfIdf (term frequency/inverse doc frequency) – weighting for
important words
 LSI (latent semantic indexing) – e.g. ‘dogs’ is similar to ‘canines’
because they are used similarly (both near ‘bark’ and ‘bite’)
 Natural Language parsing
 extracting requirements from jobs postings
 Unstructured data
 images, video (BLOBs=binary large objects)
 how to extract features? index them? search them?
 color histograms
 convolutions/transforms for pattern matching
 looking for ICBM missiles in aerial photos of Cuba
 streams
 sports ticker, radio, stock quotes...
 XML files
 with tags indicating field names
<course>
<name>CSCE 411</name>
<title>Design and Analysis of Algorithms</title>
</course>
 Object databases
CHEM 102
Intro to Chemistry
TR, 3:00-4:00
prereq: CHEM 101
Texas A&M
College Station, TX
Div 1A
53,299 students
Dr. Frank Smith
302 Miller St.
PhD, Cornell
13 years experience
ClassOfferedAt
TaughtBy
Instructor/Employee
In a database with millions of objects,
how do you efficiently do queries (i.e. follow pointers)
and retrieve information?
 Real-world issues with databases
 it’s all about scaling up to many records (and many
users)
 data warehousing:
 full database is stored in secure, off-site location
 slices, snapshots, or views are put on interactive query
servers for fast user access (“staging”)
 might be processed or summarized data
 databases are often distributed
 different parts of the data held in different sites
 some queries are local, others are “corporate-wide”
 how to do distributed queries?
 how to keep the databases synchronized?
 CSCE 438 – Distributed Object Programming
 OLAP: OnLine Analytical Processing
data warehouse:
every transaction
ever recorded
OLAP server
nightly updates
and summaries
http://technet.microsoft.com/en-us/
library/ms174587.aspx– multi-dimensional tables of
aggregated sales in
different regions in recent
quarters, rather than “every
transaction”
– users can still look at
seasonal or geographic
trends in different product
categories
– project data onto 2D
spreadsheets, graphs
 data integrity
 missing values
 how to interpret? not available? 0? use the mean?
 duplicated values
 including partial matches (Jon Smith=John Smith?)
 inconsistency:
 multiple addresses for person
 out-of-date data
 inconsistent usage:
 does “destination” mean of first leg or whole flight?
 outliers:
 salaries that are negative, or in the trillions
 most database allow “integrity constraints” to be
defined that validate newly entered data
 Interoperability
 how can data from one database be compared
or combined with another?
 what if fields are not the same, or not present,
or used differently?
 think of medical or insurance records
 translation/mapping of terms
 standards
 units like ft/s, or gallons, etc.
 identifiers like SSN, UIN, ISBN
 “federated” databases – queries that combine
information across multiple servers
 “Data cleansing”
 filling in missing data (imputing values)
 detecting and removing outliers
 smoothing
 removing noise by averaging values together
 filtering, sampling
 keeping only selected representative values
 feature extraction
 e.g. in a photo database, which people are wearing
glasses? which have more than one person? which
are outdoors?
DATA MINING/DATA ANALYTICS
 finding patterns in the data
 statistics
 machine learning
(CSCE 633)
 Numerical data
 correlations
 multivariate regression
 fitting “models”
 predictive equations that fit the data
 from a real estate database of home sales, we get
 housing price = 100*SqFt - 6*DistanceToSchools +
0.1*AverageOfNeighborhood
 ANOVA for testing differences between groups
 R is one of the most commonly used software
packages for doing statistical analysis
 can load a data table, calculate means and
correlations, fit distributions, estimate parameters,
test hypotheses, generate graphs and histograms
 clustering
 similar photos, documents, cases
 discovery of “structure” in the data
 example: accident database
 some clusters might be identified with “accidents
involving a tractor trailer” or “accidents at night”
 top-down vs. bottom-up clustering methods
 granularity: how many clusters?
 Decision trees (classifiers)
 what factors, decisions, or treatments led to different
outcomes?
 recursive partitioning algorithms
 related methods
 “discriminant” analysis
 what factors lead to return of product?
 extract “association rules”
 boxers dogs tend to have congenital defects
 covers 5% of patients with 80% confidence
Veterinary database - dogs treated for disease
breed gender age drug sibsp outcome
terrier F 10 methotrexate 4.0 died
spaniel M 5 cytarabine 2.3 survived
doberman F 7 doxorubicin 0.1 died
 other types of data
 time series and forecasting:
 model the price of gas using autoregression
 a function of recent prices, demand, geopolitics...
 de-trend: factor out seasonal trends
 GIS (geographic information systems)
 longitude/latitude coordinates in the database
 objects: city/state boundaries, river locations, roads
 find regions in B/CS with an excess
of coffee shops
from: Basic Statistics for Business and Economics, Lind et al (2009), Ch 16.
Toy Sales
credit: Frank Curriero
FOR MORE INFORMATION:
VISIT US AT: www.kellytechno.com
ADDRESS: Flat no : 212, 2nd floor,
AnnapurnaBlock,
Aditya Enclave,
Ameerpet, Hyderabad-16.

More Related Content

What's hot

Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
ErhardRahm
 
Week12
Week12Week12
Week12
Esha Meher
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 
Z36149154
Z36149154Z36149154
Z36149154
IJERA Editor
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
Blerina Spahiu
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
Text mining
Text miningText mining
Text mining
Pankaj Thakur
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Data model
Data modelData model
Data model
Syed Zaid Irshad
 
Database
DatabaseDatabase
Database
Respa Peter
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
MahamudHasanCSE
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousingsumit621
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Text mining
Text miningText mining
Text mining
Malik Imran
 
data mining
data miningdata mining
data mining
manasa polu
 

What's hot (20)

Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
Part1
Part1Part1
Part1
 
Week12
Week12Week12
Week12
 
Data mining
Data miningData mining
Data mining
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data mining
Data miningData mining
Data mining
 
Z36149154
Z36149154Z36149154
Z36149154
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
Text mining
Text miningText mining
Text mining
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data model
Data modelData model
Data model
 
Database
DatabaseDatabase
Database
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Text mining
Text miningText mining
Text mining
 
data mining
data miningdata mining
data mining
 

Viewers also liked

Dreamlockers - Why we are the best
Dreamlockers - Why we are the bestDreamlockers - Why we are the best
Dreamlockers - Why we are the best
Vipul Narang
 
Apps for Universities, Colleges and Schools
Apps for Universities, Colleges and SchoolsApps for Universities, Colleges and Schools
Apps for Universities, Colleges and Schools
Instappy
 
ปฏิทิน อภิชญา D๑ เลขที่๑
ปฏิทิน อภิชญา D๑ เลขที่๑ปฏิทิน อภิชญา D๑ เลขที่๑
ปฏิทิน อภิชญา D๑ เลขที่๑
G ''Pamiiz Porpam
 
Abalar (2)
Abalar (2)Abalar (2)
Abalar (2)
anamgomezlopez
 
E-réputation
E-réputationE-réputation
E-réputation
Chebli Youness
 
John 8 commentary
John 8 commentaryJohn 8 commentary
John 8 commentary
GLENN PEASE
 
แผ่นพับ
แผ่นพับแผ่นพับ
แผ่นพับ
0710091902
 
Portafolio
PortafolioPortafolio
Portafolio
Jesus Colunga Jr.
 
English material
English materialEnglish material
English material
Intan Siahaan
 
งานชิ้นที่ ๓ ปฏิทิน
งานชิ้นที่ ๓ ปฏิทินงานชิ้นที่ ๓ ปฏิทิน
งานชิ้นที่ ๓ ปฏิทิน
newyawong
 
hardeep res 18.8.2015
hardeep res 18.8.2015hardeep res 18.8.2015
hardeep res 18.8.2015Hardeep Singh
 
[Modules de spécialisation] Programme GdP8
[Modules de spécialisation] Programme GdP8[Modules de spécialisation] Programme GdP8
[Modules de spécialisation] Programme GdP8
Bich Van Hoang
 
Copia De Ejercicio
Copia De EjercicioCopia De Ejercicio
Copia De Ejercicioguestad97f86
 
DBMS an Example
DBMS an ExampleDBMS an Example
DBMS an Example
Dr. C.V. Suresh Babu
 

Viewers also liked (15)

Dreamlockers - Why we are the best
Dreamlockers - Why we are the bestDreamlockers - Why we are the best
Dreamlockers - Why we are the best
 
Apps for Universities, Colleges and Schools
Apps for Universities, Colleges and SchoolsApps for Universities, Colleges and Schools
Apps for Universities, Colleges and Schools
 
ปฏิทิน อภิชญา D๑ เลขที่๑
ปฏิทิน อภิชญา D๑ เลขที่๑ปฏิทิน อภิชญา D๑ เลขที่๑
ปฏิทิน อภิชญา D๑ เลขที่๑
 
Abalar (2)
Abalar (2)Abalar (2)
Abalar (2)
 
E-réputation
E-réputationE-réputation
E-réputation
 
John 8 commentary
John 8 commentaryJohn 8 commentary
John 8 commentary
 
แผ่นพับ
แผ่นพับแผ่นพับ
แผ่นพับ
 
Portafolio
PortafolioPortafolio
Portafolio
 
English material
English materialEnglish material
English material
 
งานชิ้นที่ ๓ ปฏิทิน
งานชิ้นที่ ๓ ปฏิทินงานชิ้นที่ ๓ ปฏิทิน
งานชิ้นที่ ๓ ปฏิทิน
 
hardeep res 18.8.2015
hardeep res 18.8.2015hardeep res 18.8.2015
hardeep res 18.8.2015
 
[Modules de spécialisation] Programme GdP8
[Modules de spécialisation] Programme GdP8[Modules de spécialisation] Programme GdP8
[Modules de spécialisation] Programme GdP8
 
Dis Connections. Sla Theories
Dis Connections. Sla TheoriesDis Connections. Sla Theories
Dis Connections. Sla Theories
 
Copia De Ejercicio
Copia De EjercicioCopia De Ejercicio
Copia De Ejercicio
 
DBMS an Example
DBMS an ExampleDBMS an Example
DBMS an Example
 

Similar to Data science training in hyderabad

Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
Data_Science.ppt
Data_Science.pptData_Science.ppt
Data_Science.ppt
ANGADPRAJAPATI3
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
Soojung Hong
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive Graphs
IRJET Journal
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
JayanthSram
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
Cambridge Semantics
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
Amit Sheth
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
Peter Gfader
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
VrushaliSolanke
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Mahir Haque
 
Optim test data management for IMS 2011
Optim test data management for IMS 2011Optim test data management for IMS 2011
Optim test data management for IMS 2011evgeni77
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
Michele Thomas
 

Similar to Data science training in hyderabad (20)

Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Data_Science.ppt
Data_Science.pptData_Science.ppt
Data_Science.ppt
 
Fundamentals of Database Design
Fundamentals of Database DesignFundamentals of Database Design
Fundamentals of Database Design
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Multikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive GraphsMultikeyword Hunt on Progressive Graphs
Multikeyword Hunt on Progressive Graphs
 
BDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdfBDA Mod1@AzDOCUMENTS.in.pdf
BDA Mod1@AzDOCUMENTS.in.pdf
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Datamining
DataminingDatamining
Datamining
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Optim test data management for IMS 2011
Optim test data management for IMS 2011Optim test data management for IMS 2011
Optim test data management for IMS 2011
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 
Dbms
DbmsDbms
Dbms
 

Recently uploaded

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 

Recently uploaded (20)

Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 

Data science training in hyderabad

  • 2. TOPICS  databases and data architectures  databases in the real world  scaling, data quality, distributed  machine learning/data mining/statistics  information retrieval
  • 3.  Data Science is currently a popular interest of employers  our Industrial Affiliates Partners say there is high demand for students trained in Data Science  databases, warehousing, data architectures  data analytics – statistics, machine learning  Big Data – gigabytes/day or more  Examples:  Walmart, cable companies (ads linked to content, viewer trends), airlines/Orbitz, HMOs, call centers, Twitter (500M tweets/day), traffic surveillance cameras, detecting fraud, identity theft...  supports “Business Intelligence”  quantitative decision-making and control  finance, inventory, pricing/marketing, advertising  need data for identifying risks, opportunities, conducting “what-if” analyses
  • 4. DATA ARCHITECTURES  traditional databases (CSCE 310/608)  tables, fields  tuples = records or rows  <yellowstone,WY,6000000 acres,geysers>  key = field with unique values  can be used as a reference from one table into another  important for avoiding redundancy (normalization), which risks inconsistency  join – combining 2 tables using a key  metadata – data about the data  names of the fields, types (string, int, real, mpeg...)  also things like source, date, size, completeness/sampling
  • 5. Name HomeTown Grad school PhD teaches title John Flaherty Houston, TX Rice 2005 CSCE 411 Design and Analysis of Algorithms Susan Jenkins Omaha, NE Univ of Michigan 2004 CSCE 121 Introduction to Computing in C++ Susan Jenkins Omaha, NE Univ of Michigan 2004 CSCE 206 Programming in C Bill Jones Pittsburgh, PA Carnegie Mellon 1999 CSCE 314 Programming Languages Bill Jones Pittsburgh, PA Carnegie Mellon 1999 CSCE 206 Programming in C Name teaches John Flaherty CSCE 411 Susan Jenkins CSCE 121 Susan Jenkins CSCE 206 Bill Jones CSCE 314 Bill Jones CSCE 206 course title CSCE 411 Design and Analysis of Algorithms CSCE 121 Introduction to Computing in C++ CSCE 314 Programming Languages CSCE 206 Programming in C Name HomeTown Grad school PhD John Flaherty Houston, TX Rice 2005 Susan Jenkins Omaha, NE Univ of Michigan 2004 Bill Jones Pittsburgh, PA Carnegie Mellon 1999 Instructors: TeachingAssignments: Courses:
  • 6.  SQL: Structured Query Language >SELECT Name,HomeTown FROM Instructors WHERE PhD<2000; Bill Jones Pittsburgh, PA >SELECT Course,Title FROM Courses ORDER BY Course; CSCE 121 Introduction to Computing in C++ CSCE 206 Programming in C CSCE 314 Programming Languages CSCE 411 Design and Analysis of Algorithms can also compute sums, counts, means, etc. example of JOIN: find all courses taught by someone from CMU: >SELECT TeachingAssignments.Course FROM Instructors JOIN TeachingAssignments ON Instructors.Name=TeachingAssigmnents.Name WHERE Instructor.PhD=“Carnegie Mellon” CSCE 314 CSCE 206 because they were both taught by Bill Jones
  • 7.  SQL servers  centralized database, required for concurrent access by multiple users  ODBC: Open DataBase Connectivity – protocol to connect to servers and do queries, updates from languages like Java, C, Python  Oracle, IBM DB2 - industrial strength SQL databases
  • 8.  some efficiency issues with real databases  indexing  how to efficiently find all songs written by Paul Simon in a database with 10,000,000 entries?  data structures for representing sorted order on fields  disk management  databases are often too big to fit in RAM, leave most of it on disk and swap in blocks of records as needed – could be slow  concurrency  transaction semantics: either all updates happen en batch or none (commit or rollback)  like delete one record and simultaneously add another but guarantee not to leave in an inconsistent state  other users might be blocked till done  query optimization  the order in which you JOIN tables can drastically affect the size of the intermediate tables
  • 9.  Unstructured data  raw text  documents, digital libraries  grep, substring indexing, regular expressions  like find all instances of “[aA]g+ies” including “agggggies”  Information Retrieval (CSCE 470)  look for synonyms, similar words (like “car” and “auto”)  tfIdf (term frequency/inverse doc frequency) – weighting for important words  LSI (latent semantic indexing) – e.g. ‘dogs’ is similar to ‘canines’ because they are used similarly (both near ‘bark’ and ‘bite’)  Natural Language parsing  extracting requirements from jobs postings
  • 10.  Unstructured data  images, video (BLOBs=binary large objects)  how to extract features? index them? search them?  color histograms  convolutions/transforms for pattern matching  looking for ICBM missiles in aerial photos of Cuba  streams  sports ticker, radio, stock quotes...  XML files  with tags indicating field names <course> <name>CSCE 411</name> <title>Design and Analysis of Algorithms</title> </course>
  • 11.  Object databases CHEM 102 Intro to Chemistry TR, 3:00-4:00 prereq: CHEM 101 Texas A&M College Station, TX Div 1A 53,299 students Dr. Frank Smith 302 Miller St. PhD, Cornell 13 years experience ClassOfferedAt TaughtBy Instructor/Employee In a database with millions of objects, how do you efficiently do queries (i.e. follow pointers) and retrieve information?
  • 12.  Real-world issues with databases  it’s all about scaling up to many records (and many users)  data warehousing:  full database is stored in secure, off-site location  slices, snapshots, or views are put on interactive query servers for fast user access (“staging”)  might be processed or summarized data  databases are often distributed  different parts of the data held in different sites  some queries are local, others are “corporate-wide”  how to do distributed queries?  how to keep the databases synchronized?  CSCE 438 – Distributed Object Programming
  • 13.  OLAP: OnLine Analytical Processing data warehouse: every transaction ever recorded OLAP server nightly updates and summaries http://technet.microsoft.com/en-us/ library/ms174587.aspx– multi-dimensional tables of aggregated sales in different regions in recent quarters, rather than “every transaction” – users can still look at seasonal or geographic trends in different product categories – project data onto 2D spreadsheets, graphs
  • 14.  data integrity  missing values  how to interpret? not available? 0? use the mean?  duplicated values  including partial matches (Jon Smith=John Smith?)  inconsistency:  multiple addresses for person  out-of-date data  inconsistent usage:  does “destination” mean of first leg or whole flight?  outliers:  salaries that are negative, or in the trillions  most database allow “integrity constraints” to be defined that validate newly entered data
  • 15.  Interoperability  how can data from one database be compared or combined with another?  what if fields are not the same, or not present, or used differently?  think of medical or insurance records  translation/mapping of terms  standards  units like ft/s, or gallons, etc.  identifiers like SSN, UIN, ISBN  “federated” databases – queries that combine information across multiple servers
  • 16.  “Data cleansing”  filling in missing data (imputing values)  detecting and removing outliers  smoothing  removing noise by averaging values together  filtering, sampling  keeping only selected representative values  feature extraction  e.g. in a photo database, which people are wearing glasses? which have more than one person? which are outdoors?
  • 17. DATA MINING/DATA ANALYTICS  finding patterns in the data  statistics  machine learning (CSCE 633)
  • 18.  Numerical data  correlations  multivariate regression  fitting “models”  predictive equations that fit the data  from a real estate database of home sales, we get  housing price = 100*SqFt - 6*DistanceToSchools + 0.1*AverageOfNeighborhood  ANOVA for testing differences between groups  R is one of the most commonly used software packages for doing statistical analysis  can load a data table, calculate means and correlations, fit distributions, estimate parameters, test hypotheses, generate graphs and histograms
  • 19.  clustering  similar photos, documents, cases  discovery of “structure” in the data  example: accident database  some clusters might be identified with “accidents involving a tractor trailer” or “accidents at night”  top-down vs. bottom-up clustering methods  granularity: how many clusters?
  • 20.  Decision trees (classifiers)  what factors, decisions, or treatments led to different outcomes?  recursive partitioning algorithms  related methods  “discriminant” analysis  what factors lead to return of product?  extract “association rules”  boxers dogs tend to have congenital defects  covers 5% of patients with 80% confidence Veterinary database - dogs treated for disease breed gender age drug sibsp outcome terrier F 10 methotrexate 4.0 died spaniel M 5 cytarabine 2.3 survived doberman F 7 doxorubicin 0.1 died
  • 21.  other types of data  time series and forecasting:  model the price of gas using autoregression  a function of recent prices, demand, geopolitics...  de-trend: factor out seasonal trends  GIS (geographic information systems)  longitude/latitude coordinates in the database  objects: city/state boundaries, river locations, roads  find regions in B/CS with an excess of coffee shops from: Basic Statistics for Business and Economics, Lind et al (2009), Ch 16. Toy Sales credit: Frank Curriero
  • 22. FOR MORE INFORMATION: VISIT US AT: www.kellytechno.com ADDRESS: Flat no : 212, 2nd floor, AnnapurnaBlock, Aditya Enclave, Ameerpet, Hyderabad-16.