SlideShare a Scribd company logo
1 of 44
Download to read offline
Social Media Analytics using Azure Technologies
Koray Kocabaş
#sqlsatistanbul
Sponsors
Media Sponsor
Main Sponsor
Swag Sponsor
#sqlsatistanbul
What do we need ?
Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
#sqlsatistanbul
Session Evaluations
Evaluate sessions and get a chance for the raffle:
http://spoke.at/sqlsat451
#sqlsatistanbul
About Me...
Koray Kocabaş
Data Platform (SQL Server) MVP
Yemeksepeti Business Intelligence
Bahcesehir University Instructor
@koraykocabas
https://tr.linkedin.com/in/koraykocabas
Blog: http://www.misjournal.com
E-Mail: koraykocabas@outlook.com
The Data Deluge
#sqlsatistanbul
What kind of solutions using Big Data
• Clickstream analysis to find buying patterns
• Sentiment analysis for text data
• Fraud detection; forensic analysis
• Machine learning
• Healthcare research
• Predictive Maintenance
Just dream it. Data is everywhere!
Twitter launched in 2006
Active users per month
~316 Millions (August)
~320 Millions (October)
%80 of users is Mobile!
Tweets per second 6.000
Tweets per day ~500 Million
Tweets per year ~200 Billion
Twitter generate a lot of data (12
TB per day)
90 % of buyers trust peer
recommendations
55 % of Twitter users are females
The average Twitter user has 27
Followers
Why it is so Popular?
Event based data
Unstructured data
Detail event information
Streaming
Who is the influencer
TweetTracker
TweetArchivist
Radian6
Sysomos
Tweet Deck
Hootsuite
Twitter Problems Dashboards For Tweets
#sqlsatistanbul
PROBLEMS...
#sqlsatistanbul
1. Collect Twitter Data & Get Simple Information
2. Data Enrichment
3. Store Semi - Structured Data
4. Analyze Semi - Structured Data
5. Visualize Meaningful Results
#sqlsatistanbul
#sqlsatistanbul
Collect Twitter Data & Get Simple Information
#sqlsatistanbul
#sqlsatistanbul
Real-Time Analytics
Intake millions of events per second
Process data from connected devices/apps
Detect patterns and anomalies in streaming data
Transform, augment, correlate, temporal operations
No hardware (PaaS offering)
Up and running in a few clicks (and within minutes)
No performance tuning
Efficiently pay only for usage
Not paying for idle resources
Low startup costs
Scale from small to large when required
Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)
#sqlsatistanbul
Stream Analytics Query Language Functions
DML Statements
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CASE
• JOIN
• UNION
Windowing Extensions
• Tumbling Window
• Hopping Window
• Sliding Window
• Duration
Aggregate Functions
• SUM
• COUNT
• AVG
• MIN
• MAX
Scaling Functions
• WITH
• PARTITION BY
Date and Time Functions
• DATENAME
• DATEPART
• DAY
• MONTH
• YEAR
• DATETIMEFROMPARTS
• DATEDIFF
• DATADD
String Functions
• LEN
• CONCAT
• CHARINDEX
• SUBSTRING
Statistical Functions
• VAR
• VARP
• STDEV
0 5 10 15 20 25 30
0 5 10 15 20 25 30
4
4
5
The count of tweets every 10 secondsTumbling Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, TumblingWindow(second,10)
0 5 10 15 20 25 30
Every 5 seconds give me the count of
tweets over 10 seconds by topic
Hopping Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, HoppingWindow(second,10,5)
0 5 10 15 20 25 30
If the tweets count is above a threshold
of 8 for a total of 5 seconds
Sliding Windows
SELECT Topic, Count(*) AS Count
FROM sqlsaturdaystream TIMESTAMP BY CreatedAt
GROUP BY Topic, SlidingWindow(second,5)
HAVING Count(*)>8
#sqlsatistanbul
Stream Analytics
Event Hub
#sqlsatistanbul
Data Enrichment
#sqlsatistanbul
Data Azure Machine Learning Consumers
Local storage
Upload data from PC…
Cloud storage
Azure Storage
Azure Table
Hive
etc.
Excel
Business Apps
Business problem Modeling Business valueDeployment
Azure Marketplace
(Applications store)
Azure ML Gallery
(community)
ML Web Services
(REST API Services)
ML Studio
(Web IDE)
Workspace:
Experiments
Datasets
Trained models
Notebooks
Access settings
Data Model API
Manage
API
#sqlsatistanbul
#sqlsatistanbul
https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment
http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
Sentiment140 (formerly known as "Twitter Sentiment")
allows you to discover the sentiment of a brand, product,
or topic on Twitter.
#sqlsatistanbul
SQL Server 2016
CTP 3.1
Revolution R Open
3.2.2 for Revolution
R Enterprise
Revolution R
Enterprise 7.5.0
Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS.
Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.
#sqlsatistanbul
The Klout Score is a number between 1-100 that
represents your influence.
Collect and normalize more than 12 billion signals
a day
Hive data warehouse of more than 1 trillion rows
Klout acquired for $200 million by Lithium
Technologies
#sqlsatistanbul
Store Semi - Structured Data
Analyze Semi - Structured Data
#sqlsatistanbul
#sqlsatistanbul
Developed by Facebook. Later it was adopted in Apache as an open source project.
A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis
Integration between Hadoop and BI and visualization
Provides an SQL Like language called Hive QL to query data
Create Index, includes Partitioning
Not supported Update (isn’t correct)
Hive provides Users, Groups, Roles. But it’s not designed for high security.
Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc.
Most popular Business Intelligence Tools support Hive
#sqlsatistanbul
Data Types
Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc.
Complex Data Types: arrays, maps, structs
ARRAY<string>: workplace: istanbul, ankara
STRUCT<sex:string,age:int> : Female,25
MAP<string,int>: SOLR:92
Hive RDBMS
SQL Interface SQL Interface
Focus on analytics ay focus on online or analytics
No transactions Transactions usually supported
Partition adds, no random Inserts. Random Insert and Update supported
Distributed processing via map/reduce Distributed processing varies by vendor (if available)
Scales to hundreds of nodes Seldom scale beyond 20 nodes
Built for commodity hardware Often built on proprietary hardware (especially when scaling out)
Low cost per petabyte What's petabyte? :) (note: Are you sure?)
#sqlsatistanbul
http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf
#sqlsatistanbul
#sqlsatistanbul
Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter)
A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs
Processing large semi-structured data sets using Hadoop Map Reduce
Write complex MapReduce jobs using a simple script language (Pig Latin)
Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.)
Developers can develop UDF
Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera)
Easy to use and efficient
#sqlsatistanbul
Data Types
Simple Data Types: int, float, double, chararray (UTF-8), bytearray
Complex Data Types: map (Key,Value), Tuple, Bag (list of tuples)
Commands
Loading: LOAD, STORE, DUMP
Filtering: FILTER, FOREACH, DISTINCT
Grouping: JOIN, GROUP, COGROUP, CROSS
Ordering: ORDER, LIMIT
Merging & Split: UNION, SPLIT
SQL SCRIPT PIG SCRIPT
SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int);
SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3;
SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10;
SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2);
E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3);
... HAVING sum(col3) > 5 F=FILTER E BY $2>5;
... ORDER BY col1 G=ORDER F BY $0
SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1;
J=DISTINCT I;
SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1
K=GROUP A BY col1;
L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
#sqlsatistanbul
Ohhh Finally Demo Time!
#sqlsatistanbul
Visualize Meaningful Results
#sqlsatistanbul
#sqlsatistanbul
Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data
and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning
Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB,
Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science &
Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive
Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for
Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming
Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics
using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data
Science with R, Hadoop Enterprise Integration
Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and
Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical
Thinking for Data Science and Analytics
#sqlsatistanbul

More Related Content

What's hot

Understanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI DemoUnderstanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI DemoElizabeth Beutjer-Feldman
 
From Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeFrom Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeAli Hodroj
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Geo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsGeo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsAli Hodroj
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it MattersNeo4j
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsAli Hodroj
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataLuiz Henrique Zambom Santana
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital BusinessSrinath Perera
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightStampedeCon
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionRevolution Analytics
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the CloudRob Thomas
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
 

What's hot (20)

Understanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI DemoUnderstanding Cortana Intelligence Suite & Power BI Demo
Understanding Cortana Intelligence Suite & Power BI Demo
 
From Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics ConvergeFrom Data to Insights to Action: When Transactions and Analytics Converge
From Data to Insights to Action: When Transactions and Analytics Converge
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Geo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data GridsGeo-Analytics with Apache Spark and In-Memory Data Grids
Geo-Analytics with Apache Spark and In-Memory Data Grids
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it Matters
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 

Viewers also liked

Big data con SQL Server 2014
Big data con SQL Server 2014Big data con SQL Server 2014
Big data con SQL Server 2014Eduardo Castro
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Consortium
 
Research methodology theory chapt. 1- kotthari
Research methodology theory  chapt. 1- kotthariResearch methodology theory  chapt. 1- kotthari
Research methodology theory chapt. 1- kotthariRubia Bhatia
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Viewers also liked (6)

Big data con SQL Server 2014
Big data con SQL Server 2014Big data con SQL Server 2014
Big data con SQL Server 2014
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 
Research methodology theory chapt. 1- kotthari
Research methodology theory  chapt. 1- kotthariResearch methodology theory  chapt. 1- kotthari
Research methodology theory chapt. 1- kotthari
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Similar to Social media analytics using Azure Technologies

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)Peter Presnell
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...aiuy
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with AnalyticsWSO2
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
 

Similar to Social media analytics using Azure Technologies (20)

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)MWLUG 2014: Modern Domino (workshop)
MWLUG 2014: Modern Domino (workshop)
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure Implementing Real-Time IoT Stream Processing in Azure
Implementing Real-Time IoT Stream Processing in Azure
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Recently uploaded (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

Social media analytics using Azure Technologies

  • 1. Social Media Analytics using Azure Technologies Koray Kocabaş
  • 3. #sqlsatistanbul What do we need ? Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.
  • 4. #sqlsatistanbul Session Evaluations Evaluate sessions and get a chance for the raffle: http://spoke.at/sqlsat451
  • 5. #sqlsatistanbul About Me... Koray Kocabaş Data Platform (SQL Server) MVP Yemeksepeti Business Intelligence Bahcesehir University Instructor @koraykocabas https://tr.linkedin.com/in/koraykocabas Blog: http://www.misjournal.com E-Mail: koraykocabas@outlook.com
  • 7. #sqlsatistanbul What kind of solutions using Big Data • Clickstream analysis to find buying patterns • Sentiment analysis for text data • Fraud detection; forensic analysis • Machine learning • Healthcare research • Predictive Maintenance Just dream it. Data is everywhere!
  • 8.
  • 9. Twitter launched in 2006 Active users per month ~316 Millions (August) ~320 Millions (October) %80 of users is Mobile! Tweets per second 6.000 Tweets per day ~500 Million Tweets per year ~200 Billion Twitter generate a lot of data (12 TB per day) 90 % of buyers trust peer recommendations 55 % of Twitter users are females The average Twitter user has 27 Followers
  • 10. Why it is so Popular?
  • 11.
  • 12.
  • 13. Event based data Unstructured data Detail event information Streaming Who is the influencer TweetTracker TweetArchivist Radian6 Sysomos Tweet Deck Hootsuite Twitter Problems Dashboards For Tweets
  • 15. #sqlsatistanbul 1. Collect Twitter Data & Get Simple Information 2. Data Enrichment 3. Store Semi - Structured Data 4. Analyze Semi - Structured Data 5. Visualize Meaningful Results
  • 17. #sqlsatistanbul Collect Twitter Data & Get Simple Information
  • 19. #sqlsatistanbul Real-Time Analytics Intake millions of events per second Process data from connected devices/apps Detect patterns and anomalies in streaming data Transform, augment, correlate, temporal operations No hardware (PaaS offering) Up and running in a few clicks (and within minutes) No performance tuning Efficiently pay only for usage Not paying for idle resources Low startup costs Scale from small to large when required Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)
  • 20. #sqlsatistanbul Stream Analytics Query Language Functions DML Statements • SELECT • FROM • WHERE • GROUP BY • HAVING • CASE • JOIN • UNION Windowing Extensions • Tumbling Window • Hopping Window • Sliding Window • Duration Aggregate Functions • SUM • COUNT • AVG • MIN • MAX Scaling Functions • WITH • PARTITION BY Date and Time Functions • DATENAME • DATEPART • DAY • MONTH • YEAR • DATETIMEFROMPARTS • DATEDIFF • DATADD String Functions • LEN • CONCAT • CHARINDEX • SUBSTRING Statistical Functions • VAR • VARP • STDEV
  • 21. 0 5 10 15 20 25 30
  • 22. 0 5 10 15 20 25 30 4 4 5 The count of tweets every 10 secondsTumbling Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, TumblingWindow(second,10)
  • 23. 0 5 10 15 20 25 30 Every 5 seconds give me the count of tweets over 10 seconds by topic Hopping Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, HoppingWindow(second,10,5)
  • 24. 0 5 10 15 20 25 30 If the tweets count is above a threshold of 8 for a total of 5 seconds Sliding Windows SELECT Topic, Count(*) AS Count FROM sqlsaturdaystream TIMESTAMP BY CreatedAt GROUP BY Topic, SlidingWindow(second,5) HAVING Count(*)>8
  • 27. #sqlsatistanbul Data Azure Machine Learning Consumers Local storage Upload data from PC… Cloud storage Azure Storage Azure Table Hive etc. Excel Business Apps Business problem Modeling Business valueDeployment Azure Marketplace (Applications store) Azure ML Gallery (community) ML Web Services (REST API Services) ML Studio (Web IDE) Workspace: Experiments Datasets Trained models Notebooks Access settings Data Model API Manage API
  • 30. #sqlsatistanbul SQL Server 2016 CTP 3.1 Revolution R Open 3.2.2 for Revolution R Enterprise Revolution R Enterprise 7.5.0 Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS. Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.
  • 31. #sqlsatistanbul The Klout Score is a number between 1-100 that represents your influence. Collect and normalize more than 12 billion signals a day Hive data warehouse of more than 1 trillion rows Klout acquired for $200 million by Lithium Technologies
  • 32. #sqlsatistanbul Store Semi - Structured Data Analyze Semi - Structured Data
  • 34. #sqlsatistanbul Developed by Facebook. Later it was adopted in Apache as an open source project. A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis Integration between Hadoop and BI and visualization Provides an SQL Like language called Hive QL to query data Create Index, includes Partitioning Not supported Update (isn’t correct) Hive provides Users, Groups, Roles. But it’s not designed for high security. Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc. Most popular Business Intelligence Tools support Hive
  • 35. #sqlsatistanbul Data Types Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc. Complex Data Types: arrays, maps, structs ARRAY<string>: workplace: istanbul, ankara STRUCT<sex:string,age:int> : Female,25 MAP<string,int>: SOLR:92 Hive RDBMS SQL Interface SQL Interface Focus on analytics ay focus on online or analytics No transactions Transactions usually supported Partition adds, no random Inserts. Random Insert and Update supported Distributed processing via map/reduce Distributed processing varies by vendor (if available) Scales to hundreds of nodes Seldom scale beyond 20 nodes Built for commodity hardware Often built on proprietary hardware (especially when scaling out) Low cost per petabyte What's petabyte? :) (note: Are you sure?)
  • 38. #sqlsatistanbul Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter) A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs Processing large semi-structured data sets using Hadoop Map Reduce Write complex MapReduce jobs using a simple script language (Pig Latin) Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.) Developers can develop UDF Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera) Easy to use and efficient
  • 39. #sqlsatistanbul Data Types Simple Data Types: int, float, double, chararray (UTF-8), bytearray Complex Data Types: map (Key,Value), Tuple, Bag (list of tuples) Commands Loading: LOAD, STORE, DUMP Filtering: FILTER, FOREACH, DISTINCT Grouping: JOIN, GROUP, COGROUP, CROSS Ordering: ORDER, LIMIT Merging & Split: UNION, SPLIT SQL SCRIPT PIG SCRIPT SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('t') AS (col1:int, col2:int, col3:int); SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3; SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10; SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2); E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3); ... HAVING sum(col3) > 5 F=FILTER E BY $2>5; ... ORDER BY col1 G=ORDER F BY $0 SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1; J=DISTINCT I; SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1 K=GROUP A BY col1; L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}
  • 43. #sqlsatistanbul Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB, Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science & Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data Science with R, Hadoop Enterprise Integration Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical Thinking for Data Science and Analytics