SlideShare a Scribd company logo
November 6-9, Seattle, WA
Big Data, Bigger Brains
How Klout Changed the Landscape of Social
Media with Hadoop and BI
Denny Lee
Microsoft
Dave Mariani
Klout
Discover and be recognized for how
you influence the world
Klout’s Big Data makes all this possible
3
15 Social Networks Processed Every Day
769 Terabytes of Data Storage
200,000 Indexed Users Added Every Day
140,000,000 Users Indexed Every Day
12,000,000,000 Social Signals Processed Every Day
30,000,000,000 API Calls Delivered Every Month
1,080,000,000,000 Rows of Data In Data Warehouse
Klout Data Architecture
The Best tool for the job
Klout Data Architecture
The Best tool for the job
6
What is Business Intelligence?
• Data Warehousing, OLAP, Dashboards, Reporting
• Ability to slice and dice data in an ad-hoc manner
• Getting the right data to the right people, at the right
time
• i.e. Now
Why Hadoop + BI?
Requirement
Hadoop
&
Hive
BI
Query
Engines
Capture & store all data Yes No
Support queries against detail data Yes No
Support interactive queries &
applications
No Yes
Support BI & visualization tools No Yes
An Example: Klout Event Tracker
1 Perform A|B Testing of User Flows
2 Optimize Registration Funnels
3 Monitor consumer engagement & retention (DAUs & MAUs)
4 Flexibly track and report on user generated events
A Flexible, Hierarchical Schema
Project:
Collection
of Events
Event:
Captured
User Action
Property
Type:
Attribute
Key
Property Value:
Attribute
Value
+K (Add a topic) event
Source,
Gender,
Location
Google Search
Male
SF
HomePage,
Actions,
Mobile iOS
Event Tracker Architecture
Warehouse
Instrument Collect Persist Query Report
Tracker API
Scala,
node.JS
Log Process
Flume
Cube
Analysis
Services
Klout UI
Scala,
AJAX UX
SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]}
ON COLUMNS,
NON EMPTY CROSSJOIN (
exists([Date].[Date].[Date].allmembers,
[Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06-
02T00:00:00]),
[Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES
MEMBER_CAPTION
ON ROWS
FROM [ProductInsight]
WHERE ({[Projects].[Project].[plusK]})
event_log
tstamp string
project string
event string
session_id bigint
ks_uid bigint
ip string
json_keys array<string>
json_values
array<string>
json_text string
dt string
hr string
{
"project":"plusK",
"event":"spend",
"session_id":"0",
"ip":"50.68.47.158",
"kloutId":“123456",
“cookie_id":”123456",
"ref":"http://klout.com/",
"type":"add_topic",
"time":"1338366015"
}
will be saved in HDFS at:
/logs/events_tracking/2012-05-30/0100
insights3:9003/track/{"project":”plu
sK","event":”spend”,
"ks_uid":123456,”type":”add_topic"}
Limitations of Direct Connectivity
11
Lack of OLEDB for ODBC
12
Pass through queries to linked servers
13
14
Linked Server Connection
EXEC master.dbo.sp_addlinkedserver
@server = N'HiveDW', @srvproduct=N'HIVE',
@provider=N'MSDASQL', @datasrc=N'Hive DW',
@provstr=N'Provider=MSDASQL.1;
Persist Security Info=True;User ID=SSAS;
Password=P@assw0rd;
CREATE VIEW vw_tbl_HiveSample AS
SELECT * FROM OpenQuery(HiveDW, 'SELECT * FROM
HiveSampleTable;')
OpenQuery / Linked Server Thoughts
• On Paper – this probably shouldn’t work
• Yet, it works and its been running for three
months in production now
• It’s more stable than the original MySQL
connection
• Has the advantage that from the cube
perspective, the DSV “looks like” its connecting
to a SQL Server
15
Demo: Hadoop to BI Workflow
16
Custom BI
Application
Commercial
BI Tool
Command
Line
HiveQL
In Excel
Hadoop & BI Together:
Query Cube using a Custom App
17
A Peek into
Product Insights >
Projects: Mobile
iOS
18
Hadoop & BI Together:
Query Cube Using Viz App
20
Hadoop & BI Together:
Query Hive using CLI
23
HiveQL Example
SELECT
get_json_object(json_text,'$.sid') as sid,
get_json_object(json_text,'$.kloutId') as kloutId,
get_json_object(json_text,'$.v') as version,
get_json_object(json_text,'$.status') as status,
event
FROM bi.event_log
WHERE project='mobile-ios'
AND dt=20121027
AND event in ('api_error', 'api_timeout')
ORDER BY sid;
Hadoop & BI Together:
Query Hive using Excel
26
Hadoop & BI Together:
Digging deeper on influence
28
My influencers
Who the heck is
Cali Lewis?
29
30
So let’s dig in!
select * from openquery(hiveprodbi, 'select * from
bi_maxwell.actor_action where network_abbr = ''gp'' and
ks_uid = 1711 and actor_ks_uid = 477358')
It returns this data:
ks_uid service_uid
service_id tstamp
message_id action
actor_service_uid tstamp_type original_message_id
original_tstamp registered_flag actor_ks_uid
1711 103493459351957813291
13 1345073319000
NULL GOOGLE_PLUS_PLUSONES
100525279016049609152
ORIGINAL_CONTENT_CREATION
z12djbti1oelu1eff22lztphnqapsd25t04
1345073319000 1 477358
31
Digging in further…
Took the original_message_id field and looked
it up on Google+'s API
Go
to: https://developers.google.com/+/api/latest/
activities/get
Enter the content_id of
'z12djbti1oelu1eff22lztphnqapsd25t04' into the
'activityId' field in the form
It will return this URL in the
payload: https://plus.google.com/+CaliLewis/pos
ts/JxAKBWXy3zZ
Ahh….Google+
32
Why Hadoop + BI?
Requirement
Hadoop
&
Hive
BI
Query
Engines
Capture & store all data Yes No
Support queries against detail data Yes No
Support interactive queries & applications No Yes
Support BI & visualization tools No Yes
Best Practices and Lessons Learned
• Avoid using a traditional database management
system for staging purposes
• Use the SQL Server OpenQuery interface for
heterogeneous joins
• Leverage Hive user-defined functions (UDFs) to
transform complex data types, such as JSON,
into rows and columns that Transact-SQL can
understand
• Manage large dimensions by using Hive views
34
Best Practices and Lessons Learned
• Make sure Hive UDFs are permanent and visible
to the ODBC provider.
• Adding to the .hiverc file – HiveCLI only
• Converting the UDF to a built-in function and
recompiling the Hive code
• Updating the Hive function registry to add the
UDFs to the built-in function list.
• modify the FunctionRegistry class to register the
UDFs (defined in an hql file)
• add all the dependent jars in hive.aux.jars.path
properties.
35
Best Practices and Lessons Learned
• Pad zero-length string data
SELECT
State =
CASE
When state = 'empty' Then Null
Else state
END,
Country =
CASE
When country = 'empty' Then Null
Else country
END
FROM OpenQuery(HiveDW,'SELECT
CASE
WHEN LENGTH(state) = 0 then ''empty''
ELSE COALESCE(state, ''empty'')
END AS state,
CASE
WHEN LENGTH(country) = 0 then ''empty''
ELSE COALESCE(country, ''empty'')
END AS country
FROM HiveSampleTable')
36
Any Questions?
37
November 6-9, Seattle, WA39
Thank you
for attending this session and
the 2012 PASS Summit in Seattle

More Related Content

What's hot

Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
NoSQLmatters
 
Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)
Soshi Nemoto
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
Kostas Pardalis
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
Better&Stronger
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
Márton Kodok
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
Gabriel Hamilton
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdb
Dimgba Kalu
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
Imply
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
Imply
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
Imply
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Lviv Startup Club
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
Norberto Leite
 
Azure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare IntegrationAzure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare Integration
BizTalk360
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
Elasticsearch
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Greg Goltsov
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
Andrew Morgan
 
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
Radu Vunvulea
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 

What's hot (20)

Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
 
Big query the first step - (MOSG)
Big query the first step - (MOSG)Big query the first step - (MOSG)
Big query the first step - (MOSG)
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdb
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
Azure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare IntegrationAzure DocumentDB for Healthcare Integration
Azure DocumentDB for Healthcare Integration
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
 
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data ExplorationSeeing at the Speed of Thought: Empowering Others Through Data Exploration
Seeing at the Speed of Thought: Empowering Others Through Data Exploration
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
How to manage one million messages per second using Azure, Radu Vunvulea, ITD...
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 

Viewers also liked

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Denny Lee
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
Denny Lee
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Denny Lee
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
Denny Lee
 
Yahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study ExcerptYahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study Excerpt
Denny Lee
 

Viewers also liked (6)

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
Yahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study ExcerptYahoo! TAO Case Study Excerpt
Yahoo! TAO Case Study Excerpt
 

Similar to Big Data, Bigger Brains

Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
Satya Shyam K Jayanty
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
Joris Poelmans
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
rustd
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
deep.bi
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
Roy Kim
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
Ajay Shriwastava
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 

Similar to Big Data, Bigger Brains (20)

Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 

More from Denny Lee

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Denny Lee
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
Denny Lee
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
Denny Lee
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Denny Lee
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Denny Lee
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
Denny Lee
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Denny Lee
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Denny Lee
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
Denny Lee
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
Denny Lee
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
Denny Lee
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
Denny Lee
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
Denny Lee
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
Denny Lee
 
Building SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsBuilding SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutions
Denny Lee
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BI
Denny Lee
 
A primer on power pivot topology and configurations
A primer on power pivot topology and configurationsA primer on power pivot topology and configurations
A primer on power pivot topology and configurations
Denny Lee
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
Denny Lee
 

More from Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Building SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsBuilding SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutions
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BI
 
A primer on power pivot topology and configurations
A primer on power pivot topology and configurationsA primer on power pivot topology and configurations
A primer on power pivot topology and configurations
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 

Recently uploaded

Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 

Recently uploaded (20)

Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 

Big Data, Bigger Brains

  • 1. November 6-9, Seattle, WA Big Data, Bigger Brains How Klout Changed the Landscape of Social Media with Hadoop and BI Denny Lee Microsoft Dave Mariani Klout
  • 2. Discover and be recognized for how you influence the world
  • 3. Klout’s Big Data makes all this possible 3 15 Social Networks Processed Every Day 769 Terabytes of Data Storage 200,000 Indexed Users Added Every Day 140,000,000 Users Indexed Every Day 12,000,000,000 Social Signals Processed Every Day 30,000,000,000 API Calls Delivered Every Month 1,080,000,000,000 Rows of Data In Data Warehouse
  • 4. Klout Data Architecture The Best tool for the job
  • 5. Klout Data Architecture The Best tool for the job
  • 6. 6 What is Business Intelligence? • Data Warehousing, OLAP, Dashboards, Reporting • Ability to slice and dice data in an ad-hoc manner • Getting the right data to the right people, at the right time • i.e. Now
  • 7. Why Hadoop + BI? Requirement Hadoop & Hive BI Query Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & applications No Yes Support BI & visualization tools No Yes
  • 8. An Example: Klout Event Tracker 1 Perform A|B Testing of User Flows 2 Optimize Registration Funnels 3 Monitor consumer engagement & retention (DAUs & MAUs) 4 Flexibly track and report on user generated events
  • 9. A Flexible, Hierarchical Schema Project: Collection of Events Event: Captured User Action Property Type: Attribute Key Property Value: Attribute Value +K (Add a topic) event Source, Gender, Location Google Search Male SF HomePage, Actions, Mobile iOS
  • 10. Event Tracker Architecture Warehouse Instrument Collect Persist Query Report Tracker API Scala, node.JS Log Process Flume Cube Analysis Services Klout UI Scala, AJAX UX SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]} ON COLUMNS, NON EMPTY CROSSJOIN ( exists([Date].[Date].[Date].allmembers, [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06- 02T00:00:00]), [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] WHERE ({[Projects].[Project].[plusK]}) event_log tstamp string project string event string session_id bigint ks_uid bigint ip string json_keys array<string> json_values array<string> json_text string dt string hr string { "project":"plusK", "event":"spend", "session_id":"0", "ip":"50.68.47.158", "kloutId":“123456", “cookie_id":”123456", "ref":"http://klout.com/", "type":"add_topic", "time":"1338366015" } will be saved in HDFS at: /logs/events_tracking/2012-05-30/0100 insights3:9003/track/{"project":”plu sK","event":”spend”, "ks_uid":123456,”type":”add_topic"}
  • 11. Limitations of Direct Connectivity 11
  • 12. Lack of OLEDB for ODBC 12
  • 13. Pass through queries to linked servers 13
  • 14. 14 Linked Server Connection EXEC master.dbo.sp_addlinkedserver @server = N'HiveDW', @srvproduct=N'HIVE', @provider=N'MSDASQL', @datasrc=N'Hive DW', @provstr=N'Provider=MSDASQL.1; Persist Security Info=True;User ID=SSAS; Password=P@assw0rd; CREATE VIEW vw_tbl_HiveSample AS SELECT * FROM OpenQuery(HiveDW, 'SELECT * FROM HiveSampleTable;')
  • 15. OpenQuery / Linked Server Thoughts • On Paper – this probably shouldn’t work • Yet, it works and its been running for three months in production now • It’s more stable than the original MySQL connection • Has the advantage that from the cube perspective, the DSV “looks like” its connecting to a SQL Server 15
  • 16. Demo: Hadoop to BI Workflow 16 Custom BI Application Commercial BI Tool Command Line HiveQL In Excel
  • 17. Hadoop & BI Together: Query Cube using a Custom App 17
  • 18. A Peek into Product Insights > Projects: Mobile iOS 18
  • 19. Hadoop & BI Together: Query Cube Using Viz App 20
  • 20. Hadoop & BI Together: Query Hive using CLI 23
  • 21. HiveQL Example SELECT get_json_object(json_text,'$.sid') as sid, get_json_object(json_text,'$.kloutId') as kloutId, get_json_object(json_text,'$.v') as version, get_json_object(json_text,'$.status') as status, event FROM bi.event_log WHERE project='mobile-ios' AND dt=20121027 AND event in ('api_error', 'api_timeout') ORDER BY sid;
  • 22. Hadoop & BI Together: Query Hive using Excel 26
  • 23. Hadoop & BI Together: Digging deeper on influence 28
  • 24. My influencers Who the heck is Cali Lewis? 29
  • 25. 30 So let’s dig in! select * from openquery(hiveprodbi, 'select * from bi_maxwell.actor_action where network_abbr = ''gp'' and ks_uid = 1711 and actor_ks_uid = 477358') It returns this data: ks_uid service_uid service_id tstamp message_id action actor_service_uid tstamp_type original_message_id original_tstamp registered_flag actor_ks_uid 1711 103493459351957813291 13 1345073319000 NULL GOOGLE_PLUS_PLUSONES 100525279016049609152 ORIGINAL_CONTENT_CREATION z12djbti1oelu1eff22lztphnqapsd25t04 1345073319000 1 477358
  • 26. 31 Digging in further… Took the original_message_id field and looked it up on Google+'s API Go to: https://developers.google.com/+/api/latest/ activities/get Enter the content_id of 'z12djbti1oelu1eff22lztphnqapsd25t04' into the 'activityId' field in the form It will return this URL in the payload: https://plus.google.com/+CaliLewis/pos ts/JxAKBWXy3zZ
  • 28. Why Hadoop + BI? Requirement Hadoop & Hive BI Query Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & applications No Yes Support BI & visualization tools No Yes
  • 29. Best Practices and Lessons Learned • Avoid using a traditional database management system for staging purposes • Use the SQL Server OpenQuery interface for heterogeneous joins • Leverage Hive user-defined functions (UDFs) to transform complex data types, such as JSON, into rows and columns that Transact-SQL can understand • Manage large dimensions by using Hive views 34
  • 30. Best Practices and Lessons Learned • Make sure Hive UDFs are permanent and visible to the ODBC provider. • Adding to the .hiverc file – HiveCLI only • Converting the UDF to a built-in function and recompiling the Hive code • Updating the Hive function registry to add the UDFs to the built-in function list. • modify the FunctionRegistry class to register the UDFs (defined in an hql file) • add all the dependent jars in hive.aux.jars.path properties. 35
  • 31. Best Practices and Lessons Learned • Pad zero-length string data SELECT State = CASE When state = 'empty' Then Null Else state END, Country = CASE When country = 'empty' Then Null Else country END FROM OpenQuery(HiveDW,'SELECT CASE WHEN LENGTH(state) = 0 then ''empty'' ELSE COALESCE(state, ''empty'') END AS state, CASE WHEN LENGTH(country) = 0 then ''empty'' ELSE COALESCE(country, ''empty'') END AS country FROM HiveSampleTable') 36
  • 33. November 6-9, Seattle, WA39 Thank you for attending this session and the 2012 PASS Summit in Seattle

Editor's Notes

  1. Analysis Services does not ship with a HiveQL cartridge In Tabular mode this is possible because many similarities between HiveQL and SQL (provided less complex queries)
  2. Business Intelligence Development Studio and SQL Server Data Tools are unable to generate an appropriate DSV for ODBC data sources through MSDASQL These tools use the .NET Framework Data Provider for OLE DB (System.Data.OleDb), which does not support the OLE DB Provider for ODBC
  3. Copy this from notepad for demo: CREATE TABLE mobile_ios_details_20120612 as SELECT get_json_object(json_text,'$.sid') as sid, get_json_object(json_text,'$.inc') as inc, get_json_object(json_text,'$.status') as status, event FROM bi.event_log WHERE project='mobile-ios' AND dt=20120612 AND get_json_object(json_text,'$.v')<>'1.5' AND (event = 'api_error' OR event = 'api_timeout') ORDER BY sid;
  4. I don’t know who she is I don’t follow her on Twitter / FB / etc If I dig in I realize that she’s with Geek.tv so it is something I watch but still…
  5. 1. Don’t throw data away, leverage Hadoop (track users and events for a/b testing) 2. BI tools aggregate data, but we need to reach back to the detail to answer deeper questions (http codes) 3. Hadoop != interactive queries (combined proprietary data with detail) 4. Use open source, but don’t reinvent the wheel (BI tools are mature, valuable & complementary) Leverage the best tool for the function or job