SlideShare a Scribd company logo
NEW ANALYSIS SKILL SET REQUIRED
RELATED POPULAR OPEN SOURCED
DISTRIBUTED DB TECHNOLOGIES
Technology Company Open Sourced On
Cassandra DataStax Apache Cassandra
used by Facebook , Linkedin ,
Twitter
BigTable Google Google BigTable
Apache HBase Apache HBase ( used by many
companies most popular)
MongoDB MongoDB Inc. Apache (written on C++,Erlang,C)
Couchbase CouchBase Inc Apache (written on Erlang)
CLASSIFICATION OF NOSQL DATABASES
Category No SQL database
Column
Oriented
Accumulo, Cassandra, Hbase.
Document Clusterpoint,Couchdb, Couchbase, MarkLogic, MongoDB
Key-Value Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom
c-treeACE
Graph Allegro, Neo4J, OrientDB, Virtuoso, Stardog
- Column Oriented DB store database store Values in Column By Column
rather in other RDBMS row by row.
- It leads to better Compression Of data and hence less space required to
store DB.
- There are Still higher Compression can be achieved when used
Probabilistic Databases.
- Similarly Document oriented Store and arrange data in form of documents.
- Key-Value store Data in form of collection of Key-value pairs. Allowing add,
insert, delete to key-value pairs.
- Graph Databases: Every Element is direct pointer to its adjacent hence no-
lookup required.
RELATION OF CLOUD COMPUTING AND BI
Go through the link below:
http://sandyclassic.wordpress.com/2013/07/0
2/data-warehousing-business-intelligence-
and-cloud-computing
BIGDATA – 5V
The Term Bigdata stems from Characterisized
by 5V:
Volume: Large Volume of data
Velocity: amount of data per seconds
Variability: level of unintentional modification
affecting data Quality throughout lifecycle of
data.
Value: Value derived from data.
Variety: large range of data which is received
from video , audio, text, image.
SOURCES OF BIGDATA WHAT NOSQL SOLVES?
Sources Example by 5V.
Volume: Youtube, large volume of video feeds
received and maintained at many video sites
like youtube, vimeo etc…
Variety: Large variety of data text, audio, video,
images, received in sites like facebook, twitter,
other social media platforms.
Velocity: Speed at which data is received in
sites like twitter, facebook (1 billion people all
feeding there data on one site)
TRENDS SHAPING INTEREST TOWARDS BIGDATA
Batch Processing Vs Real Time processing
Batch Jobs run at particular time of day like
Nightly jobs or morning jobs which depends on
slack time When server has less load.
But people now want to see the Status like in
transportation when bus is arriving on particular
stand in real time. Or in Retail as soon they
update there status the require real time
advertisements. This is shaping move towards Big
data.
PROBLEMS OF BIGDATA
Problems differentiated by 5V.
Velocity: With large volume of data received and quick turn
around latency required to reflect the data fed at facebook then
Can it be managed by regular DBMS?
DBMS- maintains ACID properties & have lots of constraints like
primary, foreign keys, check constraints etc.. with quick
turnaround or short latency required these constraints add up
processing time and volume required for storage. So all of these
sites have there own File based storage DBMS like systems
with does not have these constraints. All data is maintained in
files, id assigned to files are indexed and regularly moved (these
are publically know open sourced databases like Cassandra
developed by facebook, BigTable by Google, etc…)
Most of this databases are popularly Categorized as NoSQL
databases.
BIGDATA AND ANALYTICS
As we know now Bigdata is solving problems of
5V like the huge (V)olume of storage required for
video sites like youtube. Etc.
It’s changing how We perceive and Visualize or
Analyze data like HBase used for data storage,
Mahout of used to run analytics and find patterns.
These databases have variety of data which
require different kind of processing cannot be
achieved by traditional RDBMS based products.
Example link below:
http://sandyclassic.wordpress.com/2013/06/18/gini
-coefficient-of-economics-and-roc-curve-machine-
learning/
BIGDATA AND MAP-REDUCE
Map-Reduce Algorithm was starting point of
All we see in BigData created by Google
researcher.
Mapper divides work into multiple parallel
task, sorts within queue and filters into queue
of say 1 queue for each name.
Reducer Component Aggregates data or
summarizes from multiple units.
MAP REDUCE EXAMPLE: WORD COUNT DATA FEED
DATA SCIENCE AND BIG DATA
So Since data is mostly unstructured the best
way to analyze unstructured data is using
Analytics here Comes New Career Called
Data Scientist.
Skill Set Required for Data Scientist:
Mathematics (mostly statistics), Computer
Science, Domain like Sociology (like Social
Media Analysis),
HERE ANOTHER VIEW FROM WIKI SKILLS
REQUIRED FOR DATA SCIENTIST
SOCIAL MEDIA ANALYTICS
One application of Bigdata has been to
gather feedback about product from social
media.
Here is Sample project Report below How
and what tools can be used to Analyze social
media.
http://www.slideshare.net/SandeepSharma65
/social-media-analysis-project
BIGDATA AND HADOOP
Hadoop allows to distribute load among many
clusters.
There can be Database clusters, OS clusters,
Application Web server level clustering But here
we are dealing with OS like Distributed File
System(DFS). Hadoop DFS (HDFS) File system
developed by yahoo Competes with BigTable of
Google providing quick storage and retrieval of
data in form of files used by many social media
platforms.
BIGDATA OTHER TECHNOLOGY USAGE
‘R’ was open source Statistical Analysis
language having Statistical Constructs
available used for Analysis of data.
Java data mining API, .Net data mining API ,
python libraries are used to mine and
understand trends in Data.
PIG is another Apache Hadoop based
system used provide High level language for
analyzing large data sets.
HERE SOME LINKS
Data Science
http://thedatascience.wordpress.com/
Big Data
:http://thebigdatatrends.wordpress.com
Data Science Blog2:
http://thedatascientistview.blogspot.ie/
USE CASE: RETAIL
Retail generates huge amount of data for product
positioned on different shelf at store, replenishment level,
reorder level, merchandising, assortment planning all this
data most of it usually structured Since lots of system is
Automated but there are lots of forms, customer
feedback, planning data analysis of mails other chat
platforms.
Large Warehouses of Retail store needs plan positioning
and containers in Aisle.
Analyze trends from social media to find customer
preferences for products and offers.
Retail Innovation read:
http://sandyclassic.wordpress.com/2013/10/26/retail-
sector-innovations/
USE CASE: RETAIL-2
Retail uses lots of Sensors for tracking items
with warehouse and inside Store. The Huge
real time data (video , text and other forms)
generated every milli-second from Sensors
embedded across every store and
warehouse Cannot be analyzed by any other
medium better than in Hadoop or Bigdata
based System.
USE CASE: FINANCE
Finance being Game of numbers huge data from
Book of accounts, P&L, Balance sheets of etc
accumulates of different business over a period of
time But most books are Structured and hence the
data. But Hadoop offers huge scalable clusters to
quickly analyze structured data as well.
Lots of social media data about interest for share or
any instrument does get reflected in numbers.
Spreadsheets are popular medium of analysis and
other textual forms can be better analyzed if available
over Hadoop like clusters for a kind of semi-
structured data analysis.

More Related Content

What's hot

Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Data science
Data scienceData science
Data science
Mohamed Loey
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Dr. C.V. Suresh Babu
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Hritika Raj
 
Big Data
Big DataBig Data
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
RohithND
 
Big data
Big dataBig data
Big data
Nimish Kochhar
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Utkarsh Sharma
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Data science
Data scienceData science
Data science
Ranjit Nambisan
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Edureka!
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ANOOP V S
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Big data ppt
Big data pptBig data ppt

What's hot (20)

Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Data science
Data scienceData science
Data science
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data
Big dataBig data
Big data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Data science
Data scienceData science
Data science
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Viewers also liked

Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
TOGAF 9 Architectural Artifacts
TOGAF 9  Architectural ArtifactsTOGAF 9  Architectural Artifacts
TOGAF 9 Architectural Artifacts
Maganathin Veeraragaloo
 
Togaf9 Refcard2
Togaf9 Refcard2Togaf9 Refcard2
Togaf9 Refcard2jucaab
 
A Journey from Relational to Graph
A Journey from Relational to GraphA Journey from Relational to Graph
A Journey from Relational to Graph
Nakul Jeirath
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Tableau 8.3 server configuration
Tableau 8.3 server configurationTableau 8.3 server configuration
Architecture Review
Architecture ReviewArchitecture Review
Architecture Review
Himanshu
 
Togaf 9 template solution concept diagram
Togaf 9 template   solution concept diagramTogaf 9 template   solution concept diagram
Togaf 9 template solution concept diagram
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Revista marzo2014
Revista marzo2014Revista marzo2014
Revista marzo2014
SalsaSocial
 
Edición Especial 2014 - SalsaSocial
Edición Especial 2014 - SalsaSocialEdición Especial 2014 - SalsaSocial
Edición Especial 2014 - SalsaSocial
SalsaSocial
 
홈페이지혁신소개자료(20120611)
홈페이지혁신소개자료(20120611)홈페이지혁신소개자료(20120611)
홈페이지혁신소개자료(20120611)마경근 마
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
 
Switching from relational to the graph model
Switching from relational to the graph modelSwitching from relational to the graph model
Switching from relational to the graph model
Luca Garulli
 
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs
 
17th Edition Part 2 3
17th Edition  Part 2   317th Edition  Part 2   3
17th Edition Part 2 3Paul Holden
 

Viewers also liked (20)

Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
Cassandra useful features
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
TOGAF 9 Architectural Artifacts
TOGAF 9  Architectural ArtifactsTOGAF 9  Architectural Artifacts
TOGAF 9 Architectural Artifacts
 
Togaf9 Refcard2
Togaf9 Refcard2Togaf9 Refcard2
Togaf9 Refcard2
 
A Journey from Relational to Graph
A Journey from Relational to GraphA Journey from Relational to Graph
A Journey from Relational to Graph
 
TOGAF in 8 Steps
TOGAF in 8 StepsTOGAF in 8 Steps
TOGAF in 8 Steps
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Tableau 8.3 server configuration
Tableau 8.3 server configurationTableau 8.3 server configuration
Tableau 8.3 server configuration
 
Architecture Review
Architecture ReviewArchitecture Review
Architecture Review
 
Togaf 9 template solution concept diagram
Togaf 9 template   solution concept diagramTogaf 9 template   solution concept diagram
Togaf 9 template solution concept diagram
 
Revista marzo2014
Revista marzo2014Revista marzo2014
Revista marzo2014
 
Edición Especial 2014 - SalsaSocial
Edición Especial 2014 - SalsaSocialEdición Especial 2014 - SalsaSocial
Edición Especial 2014 - SalsaSocial
 
홈페이지혁신소개자료(20120611)
홈페이지혁신소개자료(20120611)홈페이지혁신소개자료(20120611)
홈페이지혁신소개자료(20120611)
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Switching from relational to the graph model
Switching from relational to the graph modelSwitching from relational to the graph model
Switching from relational to the graph model
 
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
 
17th Edition Part 2 3
17th Edition  Part 2   317th Edition  Part 2   3
17th Edition Part 2 3
 
TOGAF Vs E-Tom
TOGAF Vs E-TomTOGAF Vs E-Tom
TOGAF Vs E-Tom
 

Similar to Data science big data and analytics

No sql databases
No sql databasesNo sql databases
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
Bhavya Gulati
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
faizrashid1995
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
ssuseracaaae2
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
Anita Luthra
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
himanshu arora
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
Rahul Chaturvedi
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big Data
Big DataBig Data
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Imam Raza
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
vhrocca
 

Similar to Data science big data and analytics (20)

No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big Data
Big DataBig Data
Big Data
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 

More from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design ThinkingManagement Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Digital transformation journey Consulting
Digital transformation journey ConsultingDigital transformation journey Consulting
Digital transformation journey Consulting
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Agile Jira Reporting
Agile Jira Reporting Agile Jira Reporting
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment  sandeep sharmaLnt and bbby Retail Houseare industry Case assignment  sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Risk management Consulting For Municipality
Risk management Consulting For MunicipalityRisk management Consulting For Municipality
Risk management Consulting For Municipality
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
GDPR And Privacy By design Consultancy
GDPR And Privacy By design ConsultancyGDPR And Privacy By design Consultancy
GDPR And Privacy By design Consultancy
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Real implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases ExamplesReal implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases Examples
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Ffd 05 2012
Ffd 05 2012Ffd 05 2012
Biztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS serviceBiztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS service
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Pmo best practices
Pmo best practicesPmo best practices
Agile project management
Agile project managementAgile project management
Enroll hostel Business Model
Enroll hostel Business ModelEnroll hostel Business Model
Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Bpm digital transformation
Bpm digital transformationBpm digital transformation
Digital transformation explained
Digital transformation explainedDigital transformation explained
Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Organisation Structure For digital Transformation Team
Organisation Structure For digital Transformation TeamOrganisation Structure For digital Transformation Team
Organisation Structure For digital Transformation Team
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 

More from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW (20)

Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design ThinkingManagement Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
 
Major new initiatives
Major new initiativesMajor new initiatives
Major new initiatives
 
Digital transformation journey Consulting
Digital transformation journey ConsultingDigital transformation journey Consulting
Digital transformation journey Consulting
 
Agile Jira Reporting
Agile Jira Reporting Agile Jira Reporting
Agile Jira Reporting
 
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment  sandeep sharmaLnt and bbby Retail Houseare industry Case assignment  sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
 
Risk management Consulting For Municipality
Risk management Consulting For MunicipalityRisk management Consulting For Municipality
Risk management Consulting For Municipality
 
GDPR And Privacy By design Consultancy
GDPR And Privacy By design ConsultancyGDPR And Privacy By design Consultancy
GDPR And Privacy By design Consultancy
 
Real implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases ExamplesReal implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases Examples
 
Ffd 05 2012
Ffd 05 2012Ffd 05 2012
Ffd 05 2012
 
Biztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS serviceBiztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS service
 
Data modelling interview question
Data modelling interview questionData modelling interview question
Data modelling interview question
 
Pmo best practices
Pmo best practicesPmo best practices
Pmo best practices
 
Agile project management
Agile project managementAgile project management
Agile project management
 
Enroll hostel Business Model
Enroll hostel Business ModelEnroll hostel Business Model
Enroll hostel Business Model
 
Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0
 
Bpm digital transformation
Bpm digital transformationBpm digital transformation
Bpm digital transformation
 
Digital transformation explained
Digital transformation explainedDigital transformation explained
Digital transformation explained
 
Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0
 
Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0
 
Organisation Structure For digital Transformation Team
Organisation Structure For digital Transformation TeamOrganisation Structure For digital Transformation Team
Organisation Structure For digital Transformation Team
 

Recently uploaded

Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 

Recently uploaded (20)

Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 

Data science big data and analytics

  • 1. NEW ANALYSIS SKILL SET REQUIRED
  • 2. RELATED POPULAR OPEN SOURCED DISTRIBUTED DB TECHNOLOGIES Technology Company Open Sourced On Cassandra DataStax Apache Cassandra used by Facebook , Linkedin , Twitter BigTable Google Google BigTable Apache HBase Apache HBase ( used by many companies most popular) MongoDB MongoDB Inc. Apache (written on C++,Erlang,C) Couchbase CouchBase Inc Apache (written on Erlang)
  • 3. CLASSIFICATION OF NOSQL DATABASES Category No SQL database Column Oriented Accumulo, Cassandra, Hbase. Document Clusterpoint,Couchdb, Couchbase, MarkLogic, MongoDB Key-Value Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE Graph Allegro, Neo4J, OrientDB, Virtuoso, Stardog - Column Oriented DB store database store Values in Column By Column rather in other RDBMS row by row. - It leads to better Compression Of data and hence less space required to store DB. - There are Still higher Compression can be achieved when used Probabilistic Databases. - Similarly Document oriented Store and arrange data in form of documents. - Key-Value store Data in form of collection of Key-value pairs. Allowing add, insert, delete to key-value pairs. - Graph Databases: Every Element is direct pointer to its adjacent hence no- lookup required.
  • 4. RELATION OF CLOUD COMPUTING AND BI Go through the link below: http://sandyclassic.wordpress.com/2013/07/0 2/data-warehousing-business-intelligence- and-cloud-computing
  • 5. BIGDATA – 5V The Term Bigdata stems from Characterisized by 5V: Volume: Large Volume of data Velocity: amount of data per seconds Variability: level of unintentional modification affecting data Quality throughout lifecycle of data. Value: Value derived from data. Variety: large range of data which is received from video , audio, text, image.
  • 6. SOURCES OF BIGDATA WHAT NOSQL SOLVES? Sources Example by 5V. Volume: Youtube, large volume of video feeds received and maintained at many video sites like youtube, vimeo etc… Variety: Large variety of data text, audio, video, images, received in sites like facebook, twitter, other social media platforms. Velocity: Speed at which data is received in sites like twitter, facebook (1 billion people all feeding there data on one site)
  • 7. TRENDS SHAPING INTEREST TOWARDS BIGDATA Batch Processing Vs Real Time processing Batch Jobs run at particular time of day like Nightly jobs or morning jobs which depends on slack time When server has less load. But people now want to see the Status like in transportation when bus is arriving on particular stand in real time. Or in Retail as soon they update there status the require real time advertisements. This is shaping move towards Big data.
  • 8. PROBLEMS OF BIGDATA Problems differentiated by 5V. Velocity: With large volume of data received and quick turn around latency required to reflect the data fed at facebook then Can it be managed by regular DBMS? DBMS- maintains ACID properties & have lots of constraints like primary, foreign keys, check constraints etc.. with quick turnaround or short latency required these constraints add up processing time and volume required for storage. So all of these sites have there own File based storage DBMS like systems with does not have these constraints. All data is maintained in files, id assigned to files are indexed and regularly moved (these are publically know open sourced databases like Cassandra developed by facebook, BigTable by Google, etc…) Most of this databases are popularly Categorized as NoSQL databases.
  • 9. BIGDATA AND ANALYTICS As we know now Bigdata is solving problems of 5V like the huge (V)olume of storage required for video sites like youtube. Etc. It’s changing how We perceive and Visualize or Analyze data like HBase used for data storage, Mahout of used to run analytics and find patterns. These databases have variety of data which require different kind of processing cannot be achieved by traditional RDBMS based products. Example link below: http://sandyclassic.wordpress.com/2013/06/18/gini -coefficient-of-economics-and-roc-curve-machine- learning/
  • 10. BIGDATA AND MAP-REDUCE Map-Reduce Algorithm was starting point of All we see in BigData created by Google researcher. Mapper divides work into multiple parallel task, sorts within queue and filters into queue of say 1 queue for each name. Reducer Component Aggregates data or summarizes from multiple units.
  • 11. MAP REDUCE EXAMPLE: WORD COUNT DATA FEED
  • 12. DATA SCIENCE AND BIG DATA So Since data is mostly unstructured the best way to analyze unstructured data is using Analytics here Comes New Career Called Data Scientist. Skill Set Required for Data Scientist: Mathematics (mostly statistics), Computer Science, Domain like Sociology (like Social Media Analysis),
  • 13. HERE ANOTHER VIEW FROM WIKI SKILLS REQUIRED FOR DATA SCIENTIST
  • 14. SOCIAL MEDIA ANALYTICS One application of Bigdata has been to gather feedback about product from social media. Here is Sample project Report below How and what tools can be used to Analyze social media. http://www.slideshare.net/SandeepSharma65 /social-media-analysis-project
  • 15. BIGDATA AND HADOOP Hadoop allows to distribute load among many clusters. There can be Database clusters, OS clusters, Application Web server level clustering But here we are dealing with OS like Distributed File System(DFS). Hadoop DFS (HDFS) File system developed by yahoo Competes with BigTable of Google providing quick storage and retrieval of data in form of files used by many social media platforms.
  • 16. BIGDATA OTHER TECHNOLOGY USAGE ‘R’ was open source Statistical Analysis language having Statistical Constructs available used for Analysis of data. Java data mining API, .Net data mining API , python libraries are used to mine and understand trends in Data. PIG is another Apache Hadoop based system used provide High level language for analyzing large data sets.
  • 17. HERE SOME LINKS Data Science http://thedatascience.wordpress.com/ Big Data :http://thebigdatatrends.wordpress.com Data Science Blog2: http://thedatascientistview.blogspot.ie/
  • 18. USE CASE: RETAIL Retail generates huge amount of data for product positioned on different shelf at store, replenishment level, reorder level, merchandising, assortment planning all this data most of it usually structured Since lots of system is Automated but there are lots of forms, customer feedback, planning data analysis of mails other chat platforms. Large Warehouses of Retail store needs plan positioning and containers in Aisle. Analyze trends from social media to find customer preferences for products and offers. Retail Innovation read: http://sandyclassic.wordpress.com/2013/10/26/retail- sector-innovations/
  • 19. USE CASE: RETAIL-2 Retail uses lots of Sensors for tracking items with warehouse and inside Store. The Huge real time data (video , text and other forms) generated every milli-second from Sensors embedded across every store and warehouse Cannot be analyzed by any other medium better than in Hadoop or Bigdata based System.
  • 20. USE CASE: FINANCE Finance being Game of numbers huge data from Book of accounts, P&L, Balance sheets of etc accumulates of different business over a period of time But most books are Structured and hence the data. But Hadoop offers huge scalable clusters to quickly analyze structured data as well. Lots of social media data about interest for share or any instrument does get reflected in numbers. Spreadsheets are popular medium of analysis and other textual forms can be better analyzed if available over Hadoop like clusters for a kind of semi- structured data analysis.