SlideShare a Scribd company logo
and the
Big Data Landscape
Chen Li
Information Systems Group
CS Department
UC Irvine
0
Big Data / Web Warehousing
1#AsterixDB
So what went
on – and why?
What’s going
on right now?
What’s going on
Notes:
• Storage manager
per node
• Upper layers
orchestrate them
• One way in/out:
via the SQL door
Big Data in the Database World
• Enterprises needed to store and query historical
business data (data warehouses)
– 1980’s: Parallel database systems based on “shared-
nothing” architectures (Gamma/GRACE, Teradata)
– 2000’s: Netezza, Aster Data, DATAllegro, Greenplum,
Vertica, ParAccel (“Big $”acquisitions!)
• OLTP is another category (a source of Big Data)
– 1980’s: Tandem’s NonStop SQL system
2
Big Data in the Systems World
• Late 1990’s brought a need to index and query
the rapidly exploding content of the Web
– DB technology tried but failed (e.g., Inktomi)
– Google, Yahoo! et al needed to do something
• Google responded by laying a new foundation
– Google File System (GFS)
• OS-level byte stream files spanning 1000’s of machines
• Three-way replication for fault-tolerance (availability)
– MapReduce (MR) programming model
• User functions: Map and Reduce (and optionally Combine)
• “Parallel programming for dummies” – MR runtime does the
heavy lifting via partitioned parallelism
3
Input Splits
(distributed)
Mapper
Outputs
Reducer
Inputs
Reducer
Outputs
(distributed)
SHUFFLE PHASE
(based on keys)
(MapReduce: Word Count Example)
4
Partitioned
Parallelism!
. . .
. . .
Soon a Star Was Born…
• Yahoo!, Facebook, and friends read the papers
– HDFS and Hadoop MapReduce now in wide use for indexing,
clickstream analysis, log analysis, …
• Higher-level languages subsequently developed
– Pig (Yahoo!), Hive (Facebook), Jaql (IBM)
• Key-value (“NoSQL”) stores are another category
– Used to power scalable social sites, online games, …
– BigTableHBase, DynamoCassandra, MongoDB, …
5
Notes:
•Giant byte sequence
files at the bottom
•Map, sort, shuffle,
reduce layer in middle
•Possible storage layer
in middle as well
•Now at the top: HLL’s
Apache Pig (PigLatin)
• Scripting language inspired by the relational algebra
– Compiles down to a series of Hadoop MR jobs
– Relational operators include LOAD, FOREACH, FILTER,
GROUP, COGROUP, JOIN, ORDER BY, LIMIT, ...
6
Apache Hive (HiveQL)
7
• Query language inspired by an old favorite: SQL
– Compiles down to a series of Hadoop MR jobs
– Supports various HDFS file formats (text, columnar, ...)
– Numerous contenders appearing that take a non-MR-
based runtime approach (duh!) – these include Impala,
Stinger, Spark SQL, ...
Other Up-and-Coming Platforms (I)
8
Distributed
memory
Input
query 1
query 2
query 3
. . .
one-time processing
• Spark for in-memory cluster computing – for doing repetitive
data analyses, iterative machine learning tasks, ...
iter. 1 iter. 2 . . .
Input
iterative processing
(Especially gaining traction
for scaling Machine Learning)
Other Up-and-Coming Platforms (II)
• Bulk Synchronous Programming (BSP) platforms, e.g., Pregel,
Giraph, GraphLab, ..., for Big Graph analytics
9
(“Big” is the platform’s concern)
“Think Like a Vertex”
– Receive messages
– Update state
– Send messages
• Quite a few BSP-based platforms available
– Pregel (Google)
– Giraph (Facebook, LinkedIn, Twitter, Yahoo!, ...)
– Hama (Sogou, Korea Telecomm, ...)
– Distributed GraphLab (CMU, Washington)
– GraphX (Berkeley)
– Pregelix (UCI)
– ...
No Shortage of “NoSQL”
Big Data Analysis Platforms...
10
Query/Scripting
Language
High-Level API
Compiler/Optimizer
Low-Level API
Execution Engine
Resource
Management
Data Store
SQL
SQL
Dataflow
Processor
Relational
Row/
Column
Storage
SCOPE
SCOPE
Dryad
DryadLINQ
TidyFS
Quincy
AQL
Algebricks
Hyracks
Hyracks
LSM
Storage
PigLatin Jaql
Cascading
Pig Jaql
Tez
MapReduce
HBase
HDFS
Hadoop
MapReduce
Google
MapReduce
Spark
RDDs
Spark
Mesos
Meteor
Sopremo
Java/Scala
Nephele
YARN
Sawzall Dremel
FlumeJava
Dremel
Dremel
Bigtable
GFS
Omega
Cosmos
PACT
Cascading FlumeJava
11
(Pig)
Also: Today’s Big Data Tangle
AsterixDB: “One Size Fits a Bunch”
12
Semistructured
Data Management
Parallel
Database Systems
World of
Hadoop & Friends
BDMS Desiderata:
• Flexible data model
• Efficient runtime
• Full query capability
• Cost proportional to
task at hand (!)
• Designed for
continuous data
ingestion
• Support today’s “Big
Data data types”
•
•
•
create dataverse TinySocial;
use dataverse TinySocial;
create type MugshotUserType as {
id: int32,
alias: string,
name: string,
user-since: datetime,
address: {
street: string,
city: string,
state: string,
zip: string,
country: string
},
friend-ids: {{ int32 }},
employment: [EmploymentType]
}
ASTERIX Data Model (ADM)
13
create dataset MugshotUsers(MugshotUserType)
primary key id;
Highlights include:
• JSON++ based data model
• Rich type support (spatial, temporal, …)
• Records, lists, bags
• Open vs. closed types
create type EmploymentType as open {
organization-name: string,
start-date: date,
end-date: date?
}
create dataverse TinySocial;
use dataverse TinySocial;
create type MugshotUserType as {
id: int32,
alias: string,
name: string,
user-since: datetime,
address: {
street: string,
city: string,
state: string,
zip: string,
country: string
},
friend-ids: {{ int32 }},
employment: [EmploymentType]
}
create dataverse TinySocial;
use dataverse TinySocial;
create type MugshotUserType as {
id: int32
}
ASTERIX Data Model (ADM)
14
create dataset MugshotUsers(MugshotUserType)
primary key id;
Highlights include:
• JSON++ based data model
• Rich type support (spatial, temporal, …)
• Records, lists, bags
• Open vs. closed types
create type EmploymentType as open {
organization-name: string,
start-date: date,
end-date: date?
}
create dataverse TinySocial;
use dataverse TinySocial;
create type MugshotUserType as {
id: int32,
alias: string,
name: string,
user-since: datetime,
address: {
street: string,
city: string,
state: string,
zip: string,
country: string
},
friend-ids: {{ int32 }},
employment: [EmploymentType]
}
create dataverse TinySocial;
use dataverse TinySocial;
create type MugshotUserType as {
id: int32
}
create type MugshotMessageType
as closed {
message-id: int32,
author-id: int32,
timestamp: datetime,
in-response-to: int32?,
sender-location: point?,
tags: {{ string }},
message: string
}
ASTERIX Data Model (ADM)
15
create dataset MugshotUsers(MugshotUserType)
primary key id;
create dataset
MugshotMessages(MugshotMessageType)
primary key message-id;
Highlights include:
• JSON++ based data model
• Rich type support (spatial, temporal, …)
• Records, lists, bags
• Open vs. closed types
create type EmploymentType as open {
organization-name: string,
start-date: date,
end-date: date?
}
16
{ "id":1, "alias":"Margarita", "name":"MargaritaStoddard", "address”:{
"street":"234 Thomas Ave", "city":"San Hugo", "zip":"98765",
"state":"CA", "country":"USA" }
"user-since":datetime("2012-08-20T10:10:00"),
"friend-ids":{{ 2, 3, 6, 10 }}, "employment":[{
"organization-name":"Codetechno”, "start-date":date("2006-08-06") }] }
{ "id":2, "alias":"Isbel", "name":"IsbelDull", "address":{
"street":"345 James Ave", "city":"San Hugo", "zip":"98765”,
"state":"CA", "country":"USA" },
"user-since":datetime("2011-01-22T10:10:00"),
"friend-ids":{{ 1, 4 }}, "employment":[{
"organization-name":"Hexviafind”, "start-date":date("2010-04-27") }] }
{ "id":3, "alias":"Emory", "name":"EmoryUnk", "address":{
"street":"456 Jose Ave", "city":"San Hugo", "zip":"98765",
"state":"CA", "country":"USA" },
"user-since”: datetime("2012-07-10T10:10:00"),
"friend-ids":{{ 1, 5, 8, 9 }}, "employment”:[{
"organization-name":"geomedia”,
"start-date":date("2010-06-17"), "end-date":date("2010-01-26") }] }
...
Ex: MugshotUsers Data
create index msUserSinceIdx on MugshotUsers(user-since);
create index msTimestampIdx on MugshotMessages(timestamp);
create index msAuthorIdx on MugshotMessages(author-id) type btree;
create index msSenderLocIndex on MugshotMessages(sender-location) type rtree;
create index msMessageIdx on MugshotMessages(message) type keyword;
create type AccessLogType as closed
{ ip: string, time: string, user: string, verb: string, path: string, stat: int32, size: int32 };
create external dataset AccessLog(AccessLogType) using localfs
(("path"="{hostname}://{path}"), ("format"="delimited-text"), ("delimiter"="|"));
create feed socket_feed using socket_adaptor
(("sockets"="{address}:{port}"), ("addressType"="IP"),
("type-name"="MugshotMessageType"), ("format"="adm"));
connect feed socket_feed to dataset MugshotMessages;
Other DDL Features
17
External data highlights:
• Common HDFS file
formats + indexing
• Feed adaptors for
sources like Twitter
18
• Ex: List the user name and messages sent by those users who
joined the Mugshot social network in a certain time window:
from $user in dataset MugshotUsers
where $user.user-since >= datetime('2010-07-22T00:00:00')
and $user.user-since <= datetime('2012-07-29T23:59:59')
select {
"uname" : $user.name,
"messages" :
from $message in dataset MugshotMessages
where $message.author-id = $user.id
select $message.message
};
18
ASTERIX Query Language (AQL)
AQL (cont.)
19
• Ex: Identify active users and group/count them by country:
with $end := current-datetime( )
with $start := $end - duration("P30D")
from $user in dataset MugshotUsers
where some $logrecord in dataset AccessLog
satisfies $user.alias = $logrecord.user
and datetime($logrecord.time) >= $start
and datetime($logrecord.time) <= $end
group by $country := $user.address.country with $user
select {
"country" : $country,
"active users" : count($user)
}
AQL highlights:
• Lots of other features (see website!)
• Spatial predicates and aggregation
• Set-similarity (fuzzy) matching
• And plans for more…
Fuzzy Queries in AQL
20
• Ex: Find Tweets with similar content:
for $tweet1 in dataset('TweetMessages')
for $tweet2 in dataset('TweetMessages')
where $tweet1.tweetid != $tweet2.tweetid
and $tweet1.message-text ~= $tweet2.message-text
return {
"tweet1-text": $tweet1.message-text,
"tweet2-text": $tweet2.message-text
}
• Or: Find Tweets about similar topics:
for $tweet1 in dataset('TweetMessages')
for $tweet2 in dataset('TweetMessages')
where $tweet1.tweetid != $tweet2.tweetid
and $tweet1.referred-topics ~= $tweet2.referred-topics
return {
"tweet1-text": $tweet1.message-text,
"tweet2-text": $tweet2.message-text
}
Updates (and Transactions)
21
• Key-value store-
like transaction
semantics
• Insert/delete ops
with indexing
• Concurrency
control (locking)
• Crash recovery
• Backup/restore
• Ex: Add a new user to Mugshot.com:
insert into dataset MugshotUsers
( {
"id":11, "alias":"John", "name":"JohnDoe",
"address":{
"street":"789 Jane St", "city":"San Harry",
"zip":"98767", "state":"CA", "country":"USA"
},
"user-since":datetime("2010-08-15T08:10:00"),
"friend-ids":{ { 5, 9, 11 } },
"employment":[{
"organization-name":"Kongreen",
"start-date":date("20012-06-05")
}] } );
AsterixDB System Overview
2222
ASTERIX Software Stack
23
Hivesterix
Apache
VXQuery
Algebricks Algebra Layer
M/R
Layer
Pregelix
Hyracks Data-Parallel Platform
Hyracks
Job
Hadoop
M/R JobPregel Job
AQL HiveQL XQuery
AsterixDB
Native Storage Management
Transaction
Manager
Transaction Sub-System
Recovery
Manager
Lock
Manager
Log
Manager
IO
Scheduler
Disk 1 Disk n
Memory
Buffer
Cache
In-Memory
Components
Working
Memory
Datasets
Manager
( )
+
24
LSM-Based Storage + Indexing
Memory
Disk
Sequential writes to disk
Periodically merge disk trees
25
LSM-Based Filters
Memory
Disk
T1, T2, T3,
T4, T5, T6
T7, T8, T9,
T10, T11
T12, T13,
T14, T15
T16, T17
Oldest Component
[ T12, T15 ] [ T7, T11 ] [ T1, T6 ]
Intuition: Do NOT touch unneeded records
Idea: Utilize LSM partitioning to prune disk components
Q: Get all tweets > T14
26
• Recent/projected use case areas include
– Behavioral science (at UCI)
– Social data analytics
– Cell phone event analytics
– Education (MOOC analytics)
– Power usage monitoring
– Public health (joint effort with UCLA)
– Cluster management log analytics
27
Some Example Use Cases
Behavioral Science (HCI)
• First study to use logging and biosensors to measure
stress and ICT use of college students in their real
world environment (Gloria Mark, UCI Informatics)
– Focus: Multitasking and stress among “Millennials”
• Multiple data channels
– Computer logging
– Heart rate monitors
– Daily surveys
– General survey
– Exit interview
28
Learnings for AsterixDB:
• Nature of their analyses
• Extended binning support
• Data format(s) in and out
• Bugs and pain points
Social Data Analysis
(Based on 2 pilots)
#AsterixDB 29
Learnings for AsterixDB:
• Nature of their analyses
• Real vs. synthetic data
• Parallelism (grouping)
• Avoiding materialization
• Bugs and pain points
The underlying AQL query is:
use dataverse twitter;
for $t in dataset TweetMessagesShifted
let $region := create-rectangle(create-point(…, …),
create-point(…, …))
let $keyword := "mind-blowing"
where spatial-intersect($t.sender-location, $region)
and $t.send-time > datetime("2012-01-02T00:00:00Z”)
and $t.send-time < datetime("2012-12-31T23:59:59Z”)
and contains($t.message-text, $keyword)
group by $c := spatial-cell($t.sender-location,
create-point(…), 3.0, 3.0) with $t
return { "cell” : $c, "count”: count($t) }
Current Status
• 4 year initial NSF project (250+ KLOC @ UCI/UCR)
• AsterixDB BDMS is here! (Shared on June 6th, 2013)
– Semistructured “NoSQL” style data model
– Declarative (parallel) queries, inserts, deletes, …
– LSM-based storage/indexes (primary & secondary)
– Internal and external datasets both supported
– Rich set of data types (including text, time, location)
– Fuzzy and spatial query processing
– NoSQL-like transactions (for inserts/deletes)
– Data feeds and external indexes in next release
• Performance competitive (at least!) with a popular
parallel RDBMS, MongoDB, and Hive (see papers)
• Now in Apache incubation mode!
30
For More Info
AsterixDB project page: http://asterixdb.ics.uci.edu
Open source code base:
• ASTERIX: http://code.google.com/p/asterixdb/
• Hyracks: http://code.google.com/p/hyracks
• (Pregelix: http://hyracks.org/projects/pregelix/)
31

More Related Content

What's hot

Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
MongoDB
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
Arpit Poladia
 
Tutorial
TutorialTutorial
Tutorial
Atner Yegorov
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
Kangaroot
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
Chris Anderson
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
Toki Kanno
 
การจัดการฐานข้อมูล
การจัดการฐานข้อมูลการจัดการฐานข้อมูล
การจัดการฐานข้อมูล
ABELE Snvip
 
Graph databases
Graph databasesGraph databases
Graph databases
Vinoth Kannan
 
OrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KWOrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KW
gmccarvell
 
Wp1 2014
Wp1 2014Wp1 2014
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
Fabrizio Fortino
 
Drupalcamp Sth 2009 Nrcboeken
Drupalcamp Sth 2009 NrcboekenDrupalcamp Sth 2009 Nrcboeken
Drupalcamp Sth 2009 Nrcboeken
Krimson
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
Ankur Raina
 
Graph databases
Graph databasesGraph databases
Graph databases
Karol Grzegorczyk
 

What's hot (18)

Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Tutorial
TutorialTutorial
Tutorial
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Mongo db
Mongo dbMongo db
Mongo db
 
การจัดการฐานข้อมูล
การจัดการฐานข้อมูลการจัดการฐานข้อมูล
การจัดการฐานข้อมูล
 
Graph databases
Graph databasesGraph databases
Graph databases
 
OrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KWOrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KW
 
Wp1 2014
Wp1 2014Wp1 2014
Wp1 2014
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Drupalcamp Sth 2009 Nrcboeken
Drupalcamp Sth 2009 NrcboekenDrupalcamp Sth 2009 Nrcboeken
Drupalcamp Sth 2009 Nrcboeken
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 
Graph databases
Graph databasesGraph databases
Graph databases
 

Viewers also liked

Movies&amp;demographics
Movies&amp;demographicsMovies&amp;demographics
Movies&amp;demographics
jins0618
 
Cb15 presentation-yingyi
Cb15 presentation-yingyiCb15 presentation-yingyi
Cb15 presentation-yingyi
Yingyi Bu
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
jins0618
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
jins0618
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
Prashant Raaghav
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
jins0618
 
Graph processing
Graph processingGraph processing
Graph processing
yeahjs
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
jins0618
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
jins0618
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
jins0618
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
jins0618
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
Dani Solà Lagares
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
jins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Deepak Ajwani
 
Batch Graph Processing Frameworks
Batch Graph Processing FrameworksBatch Graph Processing Frameworks
Batch Graph Processing Frameworks
Alex Averbuch
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
jins0618
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
jins0618
 

Viewers also liked (20)

Movies&amp;demographics
Movies&amp;demographicsMovies&amp;demographics
Movies&amp;demographics
 
Cb15 presentation-yingyi
Cb15 presentation-yingyiCb15 presentation-yingyi
Cb15 presentation-yingyi
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
Comparing pregel related systems
Comparing pregel related systemsComparing pregel related systems
Comparing pregel related systems
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
Graph processing
Graph processingGraph processing
Graph processing
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
 
Batch Graph Processing Frameworks
Batch Graph Processing FrameworksBatch Graph Processing Frameworks
Batch Graph Processing Frameworks
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 

Similar to Chen li asterix db: 大数据处理开源平台

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
Coding serbia
Coding serbiaCoding serbia
Coding serbia
Dusan Zamurovic
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
antoinegirbal
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No Problem
Andrew Liu
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
Leveraging Big Data and Real-Time Analytics at Cxense
Leveraging Big Data and Real-Time Analytics at CxenseLeveraging Big Data and Real-Time Analytics at Cxense
Leveraging Big Data and Real-Time Analytics at Cxense
Simon Lia-Jonassen
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
Nathan Halko
 
Retaining globally distributed high availability
Retaining globally distributed high availabilityRetaining globally distributed high availability
Retaining globally distributed high availability
spil-engineering
 
Apache Drill
Apache DrillApache Drill
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
ibwhite
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
Joseph Adler
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
ibwhite
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
MongoDB
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
bodaceacat
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
Sara-Jayne Terp
 
NoSQL (Not Only SQL)
NoSQL (Not Only SQL)NoSQL (Not Only SQL)
NoSQL (Not Only SQL)
Pouria Amirian
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 

Similar to Chen li asterix db: 大数据处理开源平台 (20)

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Coding serbia
Coding serbiaCoding serbia
Coding serbia
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No ProblemIntroducing Azure DocumentDB - NoSQL, No Problem
Introducing Azure DocumentDB - NoSQL, No Problem
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Leveraging Big Data and Real-Time Analytics at Cxense
Leveraging Big Data and Real-Time Analytics at CxenseLeveraging Big Data and Real-Time Analytics at Cxense
Leveraging Big Data and Real-Time Analytics at Cxense
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Retaining globally distributed high availability
Retaining globally distributed high availabilityRetaining globally distributed high availability
Retaining globally distributed high availability
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
 
Session 03 acquiring data
Session 03 acquiring dataSession 03 acquiring data
Session 03 acquiring data
 
NoSQL (Not Only SQL)
NoSQL (Not Only SQL)NoSQL (Not Only SQL)
NoSQL (Not Only SQL)
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 

More from jins0618

Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
jins0618
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
jins0618
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
jins0618
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
jins0618
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
jins0618
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
jins0618
 
LITM
LITMLITM
LITM
jins0618
 
Some links of recommender system
Some links of recommender systemSome links of recommender system
Some links of recommender system
jins0618
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
jins0618
 

More from jins0618 (12)

Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
LITM
LITMLITM
LITM
 
Some links of recommender system
Some links of recommender systemSome links of recommender system
Some links of recommender system
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 

Recently uploaded

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

Chen li asterix db: 大数据处理开源平台

  • 1. and the Big Data Landscape Chen Li Information Systems Group CS Department UC Irvine 0
  • 2. Big Data / Web Warehousing 1#AsterixDB So what went on – and why? What’s going on right now? What’s going on
  • 3. Notes: • Storage manager per node • Upper layers orchestrate them • One way in/out: via the SQL door Big Data in the Database World • Enterprises needed to store and query historical business data (data warehouses) – 1980’s: Parallel database systems based on “shared- nothing” architectures (Gamma/GRACE, Teradata) – 2000’s: Netezza, Aster Data, DATAllegro, Greenplum, Vertica, ParAccel (“Big $”acquisitions!) • OLTP is another category (a source of Big Data) – 1980’s: Tandem’s NonStop SQL system 2
  • 4. Big Data in the Systems World • Late 1990’s brought a need to index and query the rapidly exploding content of the Web – DB technology tried but failed (e.g., Inktomi) – Google, Yahoo! et al needed to do something • Google responded by laying a new foundation – Google File System (GFS) • OS-level byte stream files spanning 1000’s of machines • Three-way replication for fault-tolerance (availability) – MapReduce (MR) programming model • User functions: Map and Reduce (and optionally Combine) • “Parallel programming for dummies” – MR runtime does the heavy lifting via partitioned parallelism 3
  • 5. Input Splits (distributed) Mapper Outputs Reducer Inputs Reducer Outputs (distributed) SHUFFLE PHASE (based on keys) (MapReduce: Word Count Example) 4 Partitioned Parallelism! . . . . . .
  • 6. Soon a Star Was Born… • Yahoo!, Facebook, and friends read the papers – HDFS and Hadoop MapReduce now in wide use for indexing, clickstream analysis, log analysis, … • Higher-level languages subsequently developed – Pig (Yahoo!), Hive (Facebook), Jaql (IBM) • Key-value (“NoSQL”) stores are another category – Used to power scalable social sites, online games, … – BigTableHBase, DynamoCassandra, MongoDB, … 5 Notes: •Giant byte sequence files at the bottom •Map, sort, shuffle, reduce layer in middle •Possible storage layer in middle as well •Now at the top: HLL’s
  • 7. Apache Pig (PigLatin) • Scripting language inspired by the relational algebra – Compiles down to a series of Hadoop MR jobs – Relational operators include LOAD, FOREACH, FILTER, GROUP, COGROUP, JOIN, ORDER BY, LIMIT, ... 6
  • 8. Apache Hive (HiveQL) 7 • Query language inspired by an old favorite: SQL – Compiles down to a series of Hadoop MR jobs – Supports various HDFS file formats (text, columnar, ...) – Numerous contenders appearing that take a non-MR- based runtime approach (duh!) – these include Impala, Stinger, Spark SQL, ...
  • 9. Other Up-and-Coming Platforms (I) 8 Distributed memory Input query 1 query 2 query 3 . . . one-time processing • Spark for in-memory cluster computing – for doing repetitive data analyses, iterative machine learning tasks, ... iter. 1 iter. 2 . . . Input iterative processing (Especially gaining traction for scaling Machine Learning)
  • 10. Other Up-and-Coming Platforms (II) • Bulk Synchronous Programming (BSP) platforms, e.g., Pregel, Giraph, GraphLab, ..., for Big Graph analytics 9 (“Big” is the platform’s concern) “Think Like a Vertex” – Receive messages – Update state – Send messages • Quite a few BSP-based platforms available – Pregel (Google) – Giraph (Facebook, LinkedIn, Twitter, Yahoo!, ...) – Hama (Sogou, Korea Telecomm, ...) – Distributed GraphLab (CMU, Washington) – GraphX (Berkeley) – Pregelix (UCI) – ...
  • 11. No Shortage of “NoSQL” Big Data Analysis Platforms... 10 Query/Scripting Language High-Level API Compiler/Optimizer Low-Level API Execution Engine Resource Management Data Store SQL SQL Dataflow Processor Relational Row/ Column Storage SCOPE SCOPE Dryad DryadLINQ TidyFS Quincy AQL Algebricks Hyracks Hyracks LSM Storage PigLatin Jaql Cascading Pig Jaql Tez MapReduce HBase HDFS Hadoop MapReduce Google MapReduce Spark RDDs Spark Mesos Meteor Sopremo Java/Scala Nephele YARN Sawzall Dremel FlumeJava Dremel Dremel Bigtable GFS Omega Cosmos PACT Cascading FlumeJava
  • 13. AsterixDB: “One Size Fits a Bunch” 12 Semistructured Data Management Parallel Database Systems World of Hadoop & Friends BDMS Desiderata: • Flexible data model • Efficient runtime • Full query capability • Cost proportional to task at hand (!) • Designed for continuous data ingestion • Support today’s “Big Data data types” • • •
  • 14. create dataverse TinySocial; use dataverse TinySocial; create type MugshotUserType as { id: int32, alias: string, name: string, user-since: datetime, address: { street: string, city: string, state: string, zip: string, country: string }, friend-ids: {{ int32 }}, employment: [EmploymentType] } ASTERIX Data Model (ADM) 13 create dataset MugshotUsers(MugshotUserType) primary key id; Highlights include: • JSON++ based data model • Rich type support (spatial, temporal, …) • Records, lists, bags • Open vs. closed types create type EmploymentType as open { organization-name: string, start-date: date, end-date: date? }
  • 15. create dataverse TinySocial; use dataverse TinySocial; create type MugshotUserType as { id: int32, alias: string, name: string, user-since: datetime, address: { street: string, city: string, state: string, zip: string, country: string }, friend-ids: {{ int32 }}, employment: [EmploymentType] } create dataverse TinySocial; use dataverse TinySocial; create type MugshotUserType as { id: int32 } ASTERIX Data Model (ADM) 14 create dataset MugshotUsers(MugshotUserType) primary key id; Highlights include: • JSON++ based data model • Rich type support (spatial, temporal, …) • Records, lists, bags • Open vs. closed types create type EmploymentType as open { organization-name: string, start-date: date, end-date: date? }
  • 16. create dataverse TinySocial; use dataverse TinySocial; create type MugshotUserType as { id: int32, alias: string, name: string, user-since: datetime, address: { street: string, city: string, state: string, zip: string, country: string }, friend-ids: {{ int32 }}, employment: [EmploymentType] } create dataverse TinySocial; use dataverse TinySocial; create type MugshotUserType as { id: int32 } create type MugshotMessageType as closed { message-id: int32, author-id: int32, timestamp: datetime, in-response-to: int32?, sender-location: point?, tags: {{ string }}, message: string } ASTERIX Data Model (ADM) 15 create dataset MugshotUsers(MugshotUserType) primary key id; create dataset MugshotMessages(MugshotMessageType) primary key message-id; Highlights include: • JSON++ based data model • Rich type support (spatial, temporal, …) • Records, lists, bags • Open vs. closed types create type EmploymentType as open { organization-name: string, start-date: date, end-date: date? }
  • 17. 16 { "id":1, "alias":"Margarita", "name":"MargaritaStoddard", "address”:{ "street":"234 Thomas Ave", "city":"San Hugo", "zip":"98765", "state":"CA", "country":"USA" } "user-since":datetime("2012-08-20T10:10:00"), "friend-ids":{{ 2, 3, 6, 10 }}, "employment":[{ "organization-name":"Codetechno”, "start-date":date("2006-08-06") }] } { "id":2, "alias":"Isbel", "name":"IsbelDull", "address":{ "street":"345 James Ave", "city":"San Hugo", "zip":"98765”, "state":"CA", "country":"USA" }, "user-since":datetime("2011-01-22T10:10:00"), "friend-ids":{{ 1, 4 }}, "employment":[{ "organization-name":"Hexviafind”, "start-date":date("2010-04-27") }] } { "id":3, "alias":"Emory", "name":"EmoryUnk", "address":{ "street":"456 Jose Ave", "city":"San Hugo", "zip":"98765", "state":"CA", "country":"USA" }, "user-since”: datetime("2012-07-10T10:10:00"), "friend-ids":{{ 1, 5, 8, 9 }}, "employment”:[{ "organization-name":"geomedia”, "start-date":date("2010-06-17"), "end-date":date("2010-01-26") }] } ... Ex: MugshotUsers Data
  • 18. create index msUserSinceIdx on MugshotUsers(user-since); create index msTimestampIdx on MugshotMessages(timestamp); create index msAuthorIdx on MugshotMessages(author-id) type btree; create index msSenderLocIndex on MugshotMessages(sender-location) type rtree; create index msMessageIdx on MugshotMessages(message) type keyword; create type AccessLogType as closed { ip: string, time: string, user: string, verb: string, path: string, stat: int32, size: int32 }; create external dataset AccessLog(AccessLogType) using localfs (("path"="{hostname}://{path}"), ("format"="delimited-text"), ("delimiter"="|")); create feed socket_feed using socket_adaptor (("sockets"="{address}:{port}"), ("addressType"="IP"), ("type-name"="MugshotMessageType"), ("format"="adm")); connect feed socket_feed to dataset MugshotMessages; Other DDL Features 17 External data highlights: • Common HDFS file formats + indexing • Feed adaptors for sources like Twitter
  • 19. 18 • Ex: List the user name and messages sent by those users who joined the Mugshot social network in a certain time window: from $user in dataset MugshotUsers where $user.user-since >= datetime('2010-07-22T00:00:00') and $user.user-since <= datetime('2012-07-29T23:59:59') select { "uname" : $user.name, "messages" : from $message in dataset MugshotMessages where $message.author-id = $user.id select $message.message }; 18 ASTERIX Query Language (AQL)
  • 20. AQL (cont.) 19 • Ex: Identify active users and group/count them by country: with $end := current-datetime( ) with $start := $end - duration("P30D") from $user in dataset MugshotUsers where some $logrecord in dataset AccessLog satisfies $user.alias = $logrecord.user and datetime($logrecord.time) >= $start and datetime($logrecord.time) <= $end group by $country := $user.address.country with $user select { "country" : $country, "active users" : count($user) } AQL highlights: • Lots of other features (see website!) • Spatial predicates and aggregation • Set-similarity (fuzzy) matching • And plans for more…
  • 21. Fuzzy Queries in AQL 20 • Ex: Find Tweets with similar content: for $tweet1 in dataset('TweetMessages') for $tweet2 in dataset('TweetMessages') where $tweet1.tweetid != $tweet2.tweetid and $tweet1.message-text ~= $tweet2.message-text return { "tweet1-text": $tweet1.message-text, "tweet2-text": $tweet2.message-text } • Or: Find Tweets about similar topics: for $tweet1 in dataset('TweetMessages') for $tweet2 in dataset('TweetMessages') where $tweet1.tweetid != $tweet2.tweetid and $tweet1.referred-topics ~= $tweet2.referred-topics return { "tweet1-text": $tweet1.message-text, "tweet2-text": $tweet2.message-text }
  • 22. Updates (and Transactions) 21 • Key-value store- like transaction semantics • Insert/delete ops with indexing • Concurrency control (locking) • Crash recovery • Backup/restore • Ex: Add a new user to Mugshot.com: insert into dataset MugshotUsers ( { "id":11, "alias":"John", "name":"JohnDoe", "address":{ "street":"789 Jane St", "city":"San Harry", "zip":"98767", "state":"CA", "country":"USA" }, "user-since":datetime("2010-08-15T08:10:00"), "friend-ids":{ { 5, 9, 11 } }, "employment":[{ "organization-name":"Kongreen", "start-date":date("20012-06-05") }] } );
  • 24. ASTERIX Software Stack 23 Hivesterix Apache VXQuery Algebricks Algebra Layer M/R Layer Pregelix Hyracks Data-Parallel Platform Hyracks Job Hadoop M/R JobPregel Job AQL HiveQL XQuery AsterixDB
  • 25. Native Storage Management Transaction Manager Transaction Sub-System Recovery Manager Lock Manager Log Manager IO Scheduler Disk 1 Disk n Memory Buffer Cache In-Memory Components Working Memory Datasets Manager ( ) + 24
  • 26. LSM-Based Storage + Indexing Memory Disk Sequential writes to disk Periodically merge disk trees 25
  • 27. LSM-Based Filters Memory Disk T1, T2, T3, T4, T5, T6 T7, T8, T9, T10, T11 T12, T13, T14, T15 T16, T17 Oldest Component [ T12, T15 ] [ T7, T11 ] [ T1, T6 ] Intuition: Do NOT touch unneeded records Idea: Utilize LSM partitioning to prune disk components Q: Get all tweets > T14 26
  • 28. • Recent/projected use case areas include – Behavioral science (at UCI) – Social data analytics – Cell phone event analytics – Education (MOOC analytics) – Power usage monitoring – Public health (joint effort with UCLA) – Cluster management log analytics 27 Some Example Use Cases
  • 29. Behavioral Science (HCI) • First study to use logging and biosensors to measure stress and ICT use of college students in their real world environment (Gloria Mark, UCI Informatics) – Focus: Multitasking and stress among “Millennials” • Multiple data channels – Computer logging – Heart rate monitors – Daily surveys – General survey – Exit interview 28 Learnings for AsterixDB: • Nature of their analyses • Extended binning support • Data format(s) in and out • Bugs and pain points
  • 30. Social Data Analysis (Based on 2 pilots) #AsterixDB 29 Learnings for AsterixDB: • Nature of their analyses • Real vs. synthetic data • Parallelism (grouping) • Avoiding materialization • Bugs and pain points The underlying AQL query is: use dataverse twitter; for $t in dataset TweetMessagesShifted let $region := create-rectangle(create-point(…, …), create-point(…, …)) let $keyword := "mind-blowing" where spatial-intersect($t.sender-location, $region) and $t.send-time > datetime("2012-01-02T00:00:00Z”) and $t.send-time < datetime("2012-12-31T23:59:59Z”) and contains($t.message-text, $keyword) group by $c := spatial-cell($t.sender-location, create-point(…), 3.0, 3.0) with $t return { "cell” : $c, "count”: count($t) }
  • 31. Current Status • 4 year initial NSF project (250+ KLOC @ UCI/UCR) • AsterixDB BDMS is here! (Shared on June 6th, 2013) – Semistructured “NoSQL” style data model – Declarative (parallel) queries, inserts, deletes, … – LSM-based storage/indexes (primary & secondary) – Internal and external datasets both supported – Rich set of data types (including text, time, location) – Fuzzy and spatial query processing – NoSQL-like transactions (for inserts/deletes) – Data feeds and external indexes in next release • Performance competitive (at least!) with a popular parallel RDBMS, MongoDB, and Hive (see papers) • Now in Apache incubation mode! 30
  • 32. For More Info AsterixDB project page: http://asterixdb.ics.uci.edu Open source code base: • ASTERIX: http://code.google.com/p/asterixdb/ • Hyracks: http://code.google.com/p/hyracks • (Pregelix: http://hyracks.org/projects/pregelix/) 31