SlideShare a Scribd company logo
1 of 22
Jeffrey Breen 
Director, Think Big Academy 
October 2014 
NoSQL to Augment Hadoop Big Data Platforms
CONFIDENTIAL | 2 
Outline 
• Introduction 
• Hadoop and NoSQL: What? Where? Why? When? 
• Document-Oriented NoSQL and Hadoop 
• Example: Add Statefulness 
• Example: Analytics Store 
• Example: Secondary Index 
− Caution: contains code 
• MongoDB Connector for Hadoop 
CONFIDENTIAL 2
Leading Provider of Big Data Solutions 
& Support 
CONFIDENTIAL | 3 
Delivering Business Value Through Big Data 
Exclusive Focus 
on Big Data Tools, 
Technologies, and 
Techniques 
Onshore Team- 
Based Engineering 
and Data Science 
Methodology 
Prebuilt, Proven 
Components to 
Accelerate 
Delivery & Lower 
Risk
CONFIDENTIAL | 4 
Agile Methodology 
Experiment-Driven Short Sprints with 
Quick Release Cycles 
We Accelerate Your 
Time to Value 
 Breaking Down Business and IT Barriers 
 Discrete Projects with Beginning and End 
 Early Releases to Validate ROI and 
Ensure Long Term Success 
DATA ENGINEERS 
DATA SCIENTISTS 
BUSINESS GOALS 
Innovation 
and Value
CONFIDENTIAL | 5 
Jeffrey Breen 
Director, Think Big Academy 
Principal Consultant and Hands-on Architect 
IT guy, Data guy, Open Source guy 
Pilot and Airplane Geek 
Twitter: @JeffreyBreen 
jeffrey.breen@thinkbiganalytics.com 
CONFIDENTIAL 5
CONFIDENTIAL | 6 
Hadoop and NoSQL 
• Not “either-or” 
− When together? Where? For what? 
• Hadoop 
− Not a database 
− Low cost storage with fault tolerance 
− Batch-oriented analytics (MapReduce, Hive, Pig) 
− Not good for random access and/or updates 
• NoSQL 
− Real databases with CRUD 
− Optimized for fast, random access 
− Many shapes and sizes (key-val, tabular, graph, document oriented) 
CONFIDENTIAL 6
CONFIDENTIAL | 7 
Reference Architecture
CONFIDENTIAL | 8 
Document-Oriented NoSQL with Hadoop 
• Advantages 
− Simple but flexible data model 
− Field-level indexing for fast querying 
− Easy and open APIs and data exchange formats 
• Examples 
1. Add Statefulness. Preserve state between jobs and other stateless 
operations. 
2. Analytics Store. Provide high performance destination for calculations 
and metrics. 
3. Secondary Indexing. Add low-latency querying and access for high-latency 
data stores like HDFS. 
CONFIDENTIAL 8
CONFIDENTIAL | 9 
 Overview 
- Sometimes you just need a fast 
and safe place to store data 
between jobs, applications, 
iterations 
 Scenarios 
- Data extraction jobs 
- Ingestion processing status 
- Broadcasting “last best” 
parameters in machine 
learning, genetic algorithms, 
and other model fitting 
{ 
"process": "db-extractor", 
"system": "database1", 
"tables": { 
"table1": { "columns": ["ts"], 
"values": ["2014-03-25 
03:15:23"] }, 
"table2": { "columns": [ 
"client_id" ], 
"values": ["43110221"] } 
} 
} 
Example: Add Statefulness 
CONFIDENTIAL 9
CONFIDENTIAL | Example: Analytics Store 
• Great place to store aggregates 
and other calculated metrics 
• Can be populated from batch or 
streaming analytics 
• Great for serving live 
dashboards and reporting 
CONFIDENTIAL 10 
{ 
"metric": "session-length", 
"visitor": "{2CC8C651-A9F4-4CB4-8639-7688FCD21D59}", 
"visit-start": "2014-03-25 03:15:23", 
"data": { 
"value": 245.3, 
"units": "seconds" } 
} 
}
• HDFS is optimized for scans; 
seeks are very expensive 
• As in relational databases, 
secondary indexes can be 
created on specific elements 
• Hive even has indexing built in, 
but keeps the results on HDFS 
(still not optimized for seeks) 
• Solution: Use separate NoSQL 
database for secondary indexes 
CONFIDENTIAL | Example: Secondary Indexing 
CONFIDENTIAL 11
Sample Clickstream Data 
• Sample Omniture clickstream files are available from Hortonworks 
− 420,000+ page views over 15 days 
− https://s3.amazonaws.com/hw-sandbox/tutorial8/RefineDemoData.zip 
• Example records combine web page and visitor information, including 
CONFIDENTIAL | geocoding: 
1331434018 2012-03-10 18:46:58 2850813067829261564 4611687161967479390 FAS-2.8- 
AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 {2CC8C651- 
A9F4-4CB4-8639-7688FCD21D59} U en-US 313 598 1259 Y Y Y 1 2 304 comcast.net 
10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; 
WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 
3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 53 0 taunton usa 521 ma 0 0 0 0 0 
ABC 0 120 ABC 0 
1331434006 2012-03-10 18:46:46 2850864012585216412 6917530841728651042 FAS-2.8- 
AS3 N 0 24.6.122.234 1 0 10 http://www.acme.com/SH55126545/VD55177927 {52B4FFFE- 
606A-1C2B-77E7-F62057879CC8} U en-us 574 0 0 U U Y 0 0 304 comcast.net 10/2/2012 
18:17:59 6 480 45 2 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) 
AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1 71 0 37 2 0 los 
gatos usa 807 ca 0 0 0 0 0 KGO 0 120 KGO 
CONFIDENTIAL 12
• Time is a very common dimension on which to organize data 
• Great for processing incoming data and for filtering any time-based queries… 
• …but can complicate other access patterns 
Hive partitions 
correspond to 
directories on 
HDFS 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=1/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=2/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=3/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=4/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=5/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=6/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=7/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=8/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=9/000000_0 
/apps/hive/warehouse/omniture_daily/year=2012/month=3/day=10/000000_0 
[…] 
CONFIDENTIAL | Time-Partitioned Data 
CONFIDENTIAL 13
CONFIDENTIAL | Top 10 ≃ Bottom 2000 
Distribution of geographic locations 
detected in clickstream data: 
> sum(subset(df, rank <= 
10)$count) 
[1] 36986 
> sum(subset(df, rank > 
max(df$rank) - 2000)$count) 
[1] 33971 
 In this sample clickstream data 
set, the top 10 cities account 
for more traffic than the bottom 
2,000 combined 
 Optimizations are usually 
designed for the most common 
cases 
- “Biggest bang for the buck” due 
to size, frequency, etc. 
- What are the chances that the 
optimizations you pick to 
handle the most common cases 
work well for the long tail? 
- What if a new business 
opportunity depends on the 
long tail? 
Welcome to the Long Tail 
CONFIDENTIAL 14 
> sum(subset(df, rank <= 10)$count) 
[1] 36986 
> sum(subset(df, rank > max(df$rank) 
- 2000)$count) 
[1] 33971
CONFIDENTIAL | Secondary Indexing in Hive 
• Hive has built-in facilities to index data 
create index location on table omniture_daily(city, state, country) 
as 'COMPACT' with deferred rebuild; 
alter index location on omniture_daily rebuild; 
• Index stores pointers to locations of each found record (path, file, and 
byte offset) 
• However, resulting index is partitioned the same way as the 
underlying table 
CONFIDENTIAL 15
Column parsing determined 
by Hive SerDe classes 
CONFIDENTIAL | Exporting Hive Data as JSON 
• Hive can easily read/write JSON data via a SerDe: 
− https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master 
add jar json-serde-1.1.9.2-Hive13-jar-with-dependencies.jar; 
create table json_export ( 
city string, 
country string, 
state string, 
bucketname string, 
offsets array<bigint>, 
year int, 
month int, 
day int 
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’ 
STORED AS TEXTFILE; 
insert into table json_export select * from 
default__omniture_daily_location__; 
CONFIDENTIAL 16 
Hadoop’s InputFormat 
and OutputFormat
Hive indices contain physical location of original data, including byte offsets: 
{ 
"city": "taunton", 
"state": "ma", 
"country": "usa”, 
"bucketname": 
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/omniture_daily/yea 
r=2012/month=3/day=10/000000_0”, 
"offsets": [ 4748045, 3522685 ], 
"year": 2012 
"day": 10, 
"month": 3, 
} 
CONFIDENTIAL | Sample Index entry 
CONFIDENTIAL 17
$ hadoop fs -text /apps/hive/warehouse/json_export/000000_0 |  
mongoimport --host localhost --db clickstream --collection locidx 
CONFIDENTIAL | Exporting Index Data to Mongo 
• Since our Hive index data is now stored on 
HDFS as JSON format, it’s very easy to 
load into Mongo directly. 
• Don’t do this in production, but that’s what 
makes simple examples so much fun: 
CONFIDENTIAL 18 
connected to: localhost 
Sat Sep 27 10:30:22.325 100 16/second 
Sat Sep 27 10:30:24.448 check 9 12262 
Sat Sep 27 10:30:24.449 imported 12262 objects
Specific file on HDFS containing 
the records of interest 
CONFIDENTIAL | Querying the Index in Mongo 
$ mongo localhost 
MongoDB shell version: 2.4.6 
connecting to: localhost 
> use clickstream; 
switched to db clickstream 
> db.locidx.find( {'state':'ma', 'city':'taunton'} ); 
{ "_id" : ObjectId("5426f42e6a6b0b1939528f80"), 
"bucketname” : 
"hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/omniture_d 
aily/year=2012/month=3/day=10/000000_0”, 
"offsets" : [ 4748045, 3522685 ], "month" : 3, "state" : "ma", 
"year" : 2012, "day" : 10, "country" : "usa", "city" : "taunton” } 
CONFIDENTIAL 19 
Byte offsets within that file 
containing the records of interest
$ curl -L 
'http://sandbox.hortonworks.com:50070/webhdfs/v1/apps/hive/warehouse/omniture 
_daily/year=2012/month=3/day=10/000000_0?op=OPEN&offset=3522685&length=615'; 
echo 
1331431385 2012-03-10 18:03:05 2850813067829261564 4611687161967479390 FAS- 
2.8-AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 
{2CC8C651-A9F4-4CB4-8639-7688FCD21D59} en-US 313 598 1259 Y Y Y 1 2 304 
comcast.net 10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; 
Windows NT 6.0; WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 
5.0; .NET CLR 3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 20 0 taunton 
usa 521 ma 0 0 0 0 ABC 0 120 ABC 
$ curl -L 
'http://sandbox.hortonworks.com:50070/webhdfs/v1/apps/hive/warehouse/omniture 
_daily/year=2012/month=3/day=10/000000_0?op=OPEN&offset=4748045&length=615'; 
echo 
1331434018 2012-03-10 18:46:58 2850813067829261564 4611687161967479390 FAS- 
2.8-AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 
{2CC8C651-A9F4-4CB4-8639-7688FCD21D59} en-US 313 598 1259 Y Y Y 1 2 304 
comcast.net 10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; 
Windows NT 6.0; WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 
5.0; .NET CLR 3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 53 0 taunton 
usa 521 ma 0 0 0 0 ABC 0 120 ABC 
CONFIDENTIAL | Using the index data to retrieve the original data 
CONFIDENTIAL 20
CONFIDENTIAL | So what’s the right way to do it? 
Check out the MongoDB Connector for Hadoop 
• Available at https://github.com/mongodb/mongo-hadoop 
• Contains a “storage engine” to connect Hive directly to MongoDB for 
live querying 
• Provides a Hive SerDe for direct access to static BSON files (i.e., 
backup files) 
• Allows Hadoop Streaming jobs (python, perl, R, etc.) access to Mongo 
files 
• And more 
CONFIDENTIAL 21
Work with the 
Leading Innovator in Big Data 
DATA SCIENTISTS 
DATA ARCHITECTS 
DATA SOLUTIONS 
Think Big Start Smart Scale Fast 
CONFIDENTIA2L2

More Related Content

What's hot

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 
Document validation in MongoDB 3.2
Document validation in MongoDB 3.2Document validation in MongoDB 3.2
Document validation in MongoDB 3.2Andrew Morgan
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
 
MongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationMongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationCameron Sim
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
 
Hermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDBHermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDBMongoDB
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!MongoDB
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Druid realtime indexing
Druid realtime indexingDruid realtime indexing
Druid realtime indexingSeoeun Park
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalYeounhee Lee
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsOUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsAndrew Morgan
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
 

What's hot (20)

Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Document validation in MongoDB 3.2
Document validation in MongoDB 3.2Document validation in MongoDB 3.2
Document validation in MongoDB 3.2
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
 
MongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationMongoDB ClickStream and Visualization
MongoDB ClickStream and Visualization
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
Hermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDBHermes: Free the Data! Distributed Computing with MongoDB
Hermes: Free the Data! Distributed Computing with MongoDB
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Druid realtime indexing
Druid realtime indexingDruid realtime indexing
Druid realtime indexing
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsOUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
 

Similar to NoSQL to Augment Hadoop Big Data Platforms

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerIBM Cloud Data Services
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octParadigma Digital
 

Similar to NoSQL to Augment Hadoop Big Data Platforms (20)

Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

NoSQL to Augment Hadoop Big Data Platforms

  • 1. Jeffrey Breen Director, Think Big Academy October 2014 NoSQL to Augment Hadoop Big Data Platforms
  • 2. CONFIDENTIAL | 2 Outline • Introduction • Hadoop and NoSQL: What? Where? Why? When? • Document-Oriented NoSQL and Hadoop • Example: Add Statefulness • Example: Analytics Store • Example: Secondary Index − Caution: contains code • MongoDB Connector for Hadoop CONFIDENTIAL 2
  • 3. Leading Provider of Big Data Solutions & Support CONFIDENTIAL | 3 Delivering Business Value Through Big Data Exclusive Focus on Big Data Tools, Technologies, and Techniques Onshore Team- Based Engineering and Data Science Methodology Prebuilt, Proven Components to Accelerate Delivery & Lower Risk
  • 4. CONFIDENTIAL | 4 Agile Methodology Experiment-Driven Short Sprints with Quick Release Cycles We Accelerate Your Time to Value  Breaking Down Business and IT Barriers  Discrete Projects with Beginning and End  Early Releases to Validate ROI and Ensure Long Term Success DATA ENGINEERS DATA SCIENTISTS BUSINESS GOALS Innovation and Value
  • 5. CONFIDENTIAL | 5 Jeffrey Breen Director, Think Big Academy Principal Consultant and Hands-on Architect IT guy, Data guy, Open Source guy Pilot and Airplane Geek Twitter: @JeffreyBreen jeffrey.breen@thinkbiganalytics.com CONFIDENTIAL 5
  • 6. CONFIDENTIAL | 6 Hadoop and NoSQL • Not “either-or” − When together? Where? For what? • Hadoop − Not a database − Low cost storage with fault tolerance − Batch-oriented analytics (MapReduce, Hive, Pig) − Not good for random access and/or updates • NoSQL − Real databases with CRUD − Optimized for fast, random access − Many shapes and sizes (key-val, tabular, graph, document oriented) CONFIDENTIAL 6
  • 7. CONFIDENTIAL | 7 Reference Architecture
  • 8. CONFIDENTIAL | 8 Document-Oriented NoSQL with Hadoop • Advantages − Simple but flexible data model − Field-level indexing for fast querying − Easy and open APIs and data exchange formats • Examples 1. Add Statefulness. Preserve state between jobs and other stateless operations. 2. Analytics Store. Provide high performance destination for calculations and metrics. 3. Secondary Indexing. Add low-latency querying and access for high-latency data stores like HDFS. CONFIDENTIAL 8
  • 9. CONFIDENTIAL | 9  Overview - Sometimes you just need a fast and safe place to store data between jobs, applications, iterations  Scenarios - Data extraction jobs - Ingestion processing status - Broadcasting “last best” parameters in machine learning, genetic algorithms, and other model fitting { "process": "db-extractor", "system": "database1", "tables": { "table1": { "columns": ["ts"], "values": ["2014-03-25 03:15:23"] }, "table2": { "columns": [ "client_id" ], "values": ["43110221"] } } } Example: Add Statefulness CONFIDENTIAL 9
  • 10. CONFIDENTIAL | Example: Analytics Store • Great place to store aggregates and other calculated metrics • Can be populated from batch or streaming analytics • Great for serving live dashboards and reporting CONFIDENTIAL 10 { "metric": "session-length", "visitor": "{2CC8C651-A9F4-4CB4-8639-7688FCD21D59}", "visit-start": "2014-03-25 03:15:23", "data": { "value": 245.3, "units": "seconds" } } }
  • 11. • HDFS is optimized for scans; seeks are very expensive • As in relational databases, secondary indexes can be created on specific elements • Hive even has indexing built in, but keeps the results on HDFS (still not optimized for seeks) • Solution: Use separate NoSQL database for secondary indexes CONFIDENTIAL | Example: Secondary Indexing CONFIDENTIAL 11
  • 12. Sample Clickstream Data • Sample Omniture clickstream files are available from Hortonworks − 420,000+ page views over 15 days − https://s3.amazonaws.com/hw-sandbox/tutorial8/RefineDemoData.zip • Example records combine web page and visitor information, including CONFIDENTIAL | geocoding: 1331434018 2012-03-10 18:46:58 2850813067829261564 4611687161967479390 FAS-2.8- AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 {2CC8C651- A9F4-4CB4-8639-7688FCD21D59} U en-US 313 598 1259 Y Y Y 1 2 304 comcast.net 10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 53 0 taunton usa 521 ma 0 0 0 0 0 ABC 0 120 ABC 0 1331434006 2012-03-10 18:46:46 2850864012585216412 6917530841728651042 FAS-2.8- AS3 N 0 24.6.122.234 1 0 10 http://www.acme.com/SH55126545/VD55177927 {52B4FFFE- 606A-1C2B-77E7-F62057879CC8} U en-us 574 0 0 U U Y 0 0 304 comcast.net 10/2/2012 18:17:59 6 480 45 2 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1 71 0 37 2 0 los gatos usa 807 ca 0 0 0 0 0 KGO 0 120 KGO CONFIDENTIAL 12
  • 13. • Time is a very common dimension on which to organize data • Great for processing incoming data and for filtering any time-based queries… • …but can complicate other access patterns Hive partitions correspond to directories on HDFS /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=1/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=2/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=3/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=4/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=5/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=6/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=7/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=8/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=9/000000_0 /apps/hive/warehouse/omniture_daily/year=2012/month=3/day=10/000000_0 […] CONFIDENTIAL | Time-Partitioned Data CONFIDENTIAL 13
  • 14. CONFIDENTIAL | Top 10 ≃ Bottom 2000 Distribution of geographic locations detected in clickstream data: > sum(subset(df, rank <= 10)$count) [1] 36986 > sum(subset(df, rank > max(df$rank) - 2000)$count) [1] 33971  In this sample clickstream data set, the top 10 cities account for more traffic than the bottom 2,000 combined  Optimizations are usually designed for the most common cases - “Biggest bang for the buck” due to size, frequency, etc. - What are the chances that the optimizations you pick to handle the most common cases work well for the long tail? - What if a new business opportunity depends on the long tail? Welcome to the Long Tail CONFIDENTIAL 14 > sum(subset(df, rank <= 10)$count) [1] 36986 > sum(subset(df, rank > max(df$rank) - 2000)$count) [1] 33971
  • 15. CONFIDENTIAL | Secondary Indexing in Hive • Hive has built-in facilities to index data create index location on table omniture_daily(city, state, country) as 'COMPACT' with deferred rebuild; alter index location on omniture_daily rebuild; • Index stores pointers to locations of each found record (path, file, and byte offset) • However, resulting index is partitioned the same way as the underlying table CONFIDENTIAL 15
  • 16. Column parsing determined by Hive SerDe classes CONFIDENTIAL | Exporting Hive Data as JSON • Hive can easily read/write JSON data via a SerDe: − https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master add jar json-serde-1.1.9.2-Hive13-jar-with-dependencies.jar; create table json_export ( city string, country string, state string, bucketname string, offsets array<bigint>, year int, month int, day int ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe’ STORED AS TEXTFILE; insert into table json_export select * from default__omniture_daily_location__; CONFIDENTIAL 16 Hadoop’s InputFormat and OutputFormat
  • 17. Hive indices contain physical location of original data, including byte offsets: { "city": "taunton", "state": "ma", "country": "usa”, "bucketname": "hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/omniture_daily/yea r=2012/month=3/day=10/000000_0”, "offsets": [ 4748045, 3522685 ], "year": 2012 "day": 10, "month": 3, } CONFIDENTIAL | Sample Index entry CONFIDENTIAL 17
  • 18. $ hadoop fs -text /apps/hive/warehouse/json_export/000000_0 | mongoimport --host localhost --db clickstream --collection locidx CONFIDENTIAL | Exporting Index Data to Mongo • Since our Hive index data is now stored on HDFS as JSON format, it’s very easy to load into Mongo directly. • Don’t do this in production, but that’s what makes simple examples so much fun: CONFIDENTIAL 18 connected to: localhost Sat Sep 27 10:30:22.325 100 16/second Sat Sep 27 10:30:24.448 check 9 12262 Sat Sep 27 10:30:24.449 imported 12262 objects
  • 19. Specific file on HDFS containing the records of interest CONFIDENTIAL | Querying the Index in Mongo $ mongo localhost MongoDB shell version: 2.4.6 connecting to: localhost > use clickstream; switched to db clickstream > db.locidx.find( {'state':'ma', 'city':'taunton'} ); { "_id" : ObjectId("5426f42e6a6b0b1939528f80"), "bucketname” : "hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/omniture_d aily/year=2012/month=3/day=10/000000_0”, "offsets" : [ 4748045, 3522685 ], "month" : 3, "state" : "ma", "year" : 2012, "day" : 10, "country" : "usa", "city" : "taunton” } CONFIDENTIAL 19 Byte offsets within that file containing the records of interest
  • 20. $ curl -L 'http://sandbox.hortonworks.com:50070/webhdfs/v1/apps/hive/warehouse/omniture _daily/year=2012/month=3/day=10/000000_0?op=OPEN&offset=3522685&length=615'; echo 1331431385 2012-03-10 18:03:05 2850813067829261564 4611687161967479390 FAS- 2.8-AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 {2CC8C651-A9F4-4CB4-8639-7688FCD21D59} en-US 313 598 1259 Y Y Y 1 2 304 comcast.net 10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 20 0 taunton usa 521 ma 0 0 0 0 ABC 0 120 ABC $ curl -L 'http://sandbox.hortonworks.com:50070/webhdfs/v1/apps/hive/warehouse/omniture _daily/year=2012/month=3/day=10/000000_0?op=OPEN&offset=4748045&length=615'; echo 1331434018 2012-03-10 18:46:58 2850813067829261564 4611687161967479390 FAS- 2.8-AS3 N 0 24.63.166.252 1 0 10 http://www.acme.com/SH5568487/VD55169229 {2CC8C651-A9F4-4CB4-8639-7688FCD21D59} en-US 313 598 1259 Y Y Y 1 2 304 comcast.net 10/2/2012 20:50:37 6 300 45 36 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; GTB7.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.30618; .NET CLR 3.5.30729; .NET4.0C) 71 0 2 53 0 taunton usa 521 ma 0 0 0 0 ABC 0 120 ABC CONFIDENTIAL | Using the index data to retrieve the original data CONFIDENTIAL 20
  • 21. CONFIDENTIAL | So what’s the right way to do it? Check out the MongoDB Connector for Hadoop • Available at https://github.com/mongodb/mongo-hadoop • Contains a “storage engine” to connect Hive directly to MongoDB for live querying • Provides a Hive SerDe for direct access to static BSON files (i.e., backup files) • Allows Hadoop Streaming jobs (python, perl, R, etc.) access to Mongo files • And more CONFIDENTIAL 21
  • 22. Work with the Leading Innovator in Big Data DATA SCIENTISTS DATA ARCHITECTS DATA SOLUTIONS Think Big Start Smart Scale Fast CONFIDENTIA2L2

Editor's Notes

  1. Think Big is a leading provider of big data solutions and analytic applications We achieve this by working in lock-step with business leaders to align their goals with big data strategy and planning services which become the roadmap for the data science and data engineering services we provide to implement big data projects.