More and more applications are leveraging the power of NoSQL as a primary means of data storage. This session, as presented at Teradata Partners Conference 2015, by Bryce Cottam, Principal Architect at Think Big, a Teradata company, covered how to successfully model application data on NoSQL storage engines for everyday application use. The presentation explores common design patterns, techniques and tips that will help developers leverage the horizontal scalability of NoSQL stores while embracing their inherent limitations. Topics include: Denormalization, Intelligent Keys (including avoiding hot-spotting), Counters, and Data Sharding.
The benefits of Hadoop for analytics make it a popular option for many companies looking to expand their analytics suite. However, adding Hadoop as an analytics platform to an existing environment based on more traditional data structures and methods poses several key challenges. Review these slides to understand key challenges and strategies to expanding the analytics suite to use Hadoop, such as: architectural integration with existing platforms, skills and organizational readiness, and the importance of a vision and a clear path forward.
This presentation given by Think Big's senior data scientist Eliano Marques at Digital Natives conference in Berlin, Germany (November 2015), details how to go from experimentation to productionization for a predictive maintenance use case.
Industrial Analytics and Predictive Maintenance 2017 - 2022Rising Media Ltd.
In this session we will present the results of two recent, international studies on the state of data analytics in industrial settings. You will get insights from an in-depth industry survey of 151 analytics professionals and decision-makers in industrial companies, providing a deep-dive into strategies, project types, cost structures and skill-demand in IoT-based analytics. In addition we will present a survey focusing on predictive analytics covering the market potential and expected development until 2022.
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
This talk introduces the landscape and challenges of predictive maintenance applications in the industry, illustrates how to formulate (data labeling and feature engineering) the problem with three machine learning models (regression, binary classification, multi-class classification) using a publicly available aircraft engine run-to-failure data set, and showcases how the models can be conveniently trained and compared with different algorithms in Azure ML.
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
Companies like Google, Microsoft, Amazon and Facebook are in fierce competition for teams that can build deep-learning applications. Because of deep learning's general usefulness in pattern recognition, those applications are surprisingly diverse, ranging from image recognition to machine translation. This talk will explore deep learning use cases for the major data types -- image, sound, text and time series -- as they're emerging in the private sector. Presented by Chris Nicholson, Co-Founder and CEO at Skymind.
The benefits of Hadoop for analytics make it a popular option for many companies looking to expand their analytics suite. However, adding Hadoop as an analytics platform to an existing environment based on more traditional data structures and methods poses several key challenges. Review these slides to understand key challenges and strategies to expanding the analytics suite to use Hadoop, such as: architectural integration with existing platforms, skills and organizational readiness, and the importance of a vision and a clear path forward.
This presentation given by Think Big's senior data scientist Eliano Marques at Digital Natives conference in Berlin, Germany (November 2015), details how to go from experimentation to productionization for a predictive maintenance use case.
Industrial Analytics and Predictive Maintenance 2017 - 2022Rising Media Ltd.
In this session we will present the results of two recent, international studies on the state of data analytics in industrial settings. You will get insights from an in-depth industry survey of 151 analytics professionals and decision-makers in industrial companies, providing a deep-dive into strategies, project types, cost structures and skill-demand in IoT-based analytics. In addition we will present a survey focusing on predictive analytics covering the market potential and expected development until 2022.
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
This talk introduces the landscape and challenges of predictive maintenance applications in the industry, illustrates how to formulate (data labeling and feature engineering) the problem with three machine learning models (regression, binary classification, multi-class classification) using a publicly available aircraft engine run-to-failure data set, and showcases how the models can be conveniently trained and compared with different algorithms in Azure ML.
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
Companies like Google, Microsoft, Amazon and Facebook are in fierce competition for teams that can build deep-learning applications. Because of deep learning's general usefulness in pattern recognition, those applications are surprisingly diverse, ranging from image recognition to machine translation. This talk will explore deep learning use cases for the major data types -- image, sound, text and time series -- as they're emerging in the private sector. Presented by Chris Nicholson, Co-Founder and CEO at Skymind.
Windows 8 Pure Imagination - 2012-11-24 - Getting your HTML5 game Windows 8 r...Frédéric Harper
You already created an HTML5 game, and you want to make it a Windows 8 game to get all the benefits of this new platform? The recipe is simple: take your HTML5 games, add some WinJS dressing, use Visual Studio to make these stick together, and get in the Windows Store oven to get a perfectly cooked app!
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...MongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.
After a brief introduction into the history of Database Management Systems different types of NoSQL data stores are characterized. Theoretical background information about sharding mechanisms, horizontal scaling and the CAP theorem are getting explained.
After a comparison of different NoSQL stores you will get to know the pros and cons of the different approaches and you will learn how to take the decision for the best fitting database in your project.
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB World 2019: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
CCM AlchemyAPI and Real-time AggregationVictor Anjos
An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction.
Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys.
All written in Python!
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
Freeing Yourself from an RDBMS ArchitectureDavid Hoerster
Explore how we can begin to move functionality from a typical RDBMS application to one that uses tools and frameworks like MongoDB, Solr and Redis. At the end, the architecture we've evolved looks similar to.........
Jonas Ohlsson, Front End Developer at State
Isomorphic React in real life
React's ability to run on both the client and server is one of its biggest advantages, but there is little information on how this can be done in real applications. In this talk I will show you how we set up isomorphic React at State, and how you can do the same for your project.
Slides for a talk given at Facebook's London offices for a London React meetup.
React's ability to run on both the client and server is one of its biggest advantages, but there is little information on how this can be done in real applications. In this talk I will show you how we set up isomorphic React at State, and how you can do the same for your project.
Blog post providing some more detail:
http://jonassebastianohlsson.com/blog/2015/03/24/isomorphic-react-in-real-life/
Windows 8 Pure Imagination - 2012-11-24 - Getting your HTML5 game Windows 8 r...Frédéric Harper
You already created an HTML5 game, and you want to make it a Windows 8 game to get all the benefits of this new platform? The recipe is simple: take your HTML5 games, add some WinJS dressing, use Visual Studio to make these stick together, and get in the Windows Store oven to get a perfectly cooked app!
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...MongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.
After a brief introduction into the history of Database Management Systems different types of NoSQL data stores are characterized. Theoretical background information about sharding mechanisms, horizontal scaling and the CAP theorem are getting explained.
After a comparison of different NoSQL stores you will get to know the pros and cons of the different approaches and you will learn how to take the decision for the best fitting database in your project.
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB World 2019: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
CCM AlchemyAPI and Real-time AggregationVictor Anjos
An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction.
Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys.
All written in Python!
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
Freeing Yourself from an RDBMS ArchitectureDavid Hoerster
Explore how we can begin to move functionality from a typical RDBMS application to one that uses tools and frameworks like MongoDB, Solr and Redis. At the end, the architecture we've evolved looks similar to.........
Jonas Ohlsson, Front End Developer at State
Isomorphic React in real life
React's ability to run on both the client and server is one of its biggest advantages, but there is little information on how this can be done in real applications. In this talk I will show you how we set up isomorphic React at State, and how you can do the same for your project.
Slides for a talk given at Facebook's London offices for a London React meetup.
React's ability to run on both the client and server is one of its biggest advantages, but there is little information on how this can be done in real applications. In this talk I will show you how we set up isomorphic React at State, and how you can do the same for your project.
Blog post providing some more detail:
http://jonassebastianohlsson.com/blog/2015/03/24/isomorphic-react-in-real-life/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
10. Where We Came From
Auction
User Bid
Payment
id
email
name
profile_image_url
access_level
created_date
id
user_id
auction_id
amount
timestamp
id
title
image_url
current_price
high_bidder
end_time
id
auction_id
timestamp
card_type
confirmation_number
11. Data Models
public class User {
private long id;
private String email;
private String name;
private String profileImageUrl;
// AccessLevel is an enum
private AccessLevel accessLevel;
private Date createdDate;
private List<Auction> auctions;
private List<Bid> bids;
...
}
public class Auction {
private long id;
private String title;
private String imageUrl;
private BigDecimal currentPrice;
private User highBidder;
private Date endTime;
private List<Bid> bids;
private Payment payment;
...
}
public class Bid {
private long id;
private User user;
private Auction auction;
private BigDecimal amount;
private Date timestamp;
...
}
public class Payment {
private long id;
private Auction auction;
private Date timestamp;
// Visa, MasterCard, AmEx etc.
private String cardType;
private String confirmationNumber;
...
}
12. Support Queries
select a.*, b.*
from auction a
join bid b
on a.id = b.auction_id
where a.id = 12345
order by b.timestamp desc
• Either manual SQL or ORM generated SQL will wind up joining a few tables to get the
desired results
• Joins are not supported by most NoSQL solutions
Get all Bids for a given Auction:
13. Support Queries
select count(*) from bid where user_id = 554422
• Aggregates in NoSQL are usually not supported
• If they are supported, they often have performance or memory issues
select avg(current_price) from auction
select u.name, max(s.bid_count) as bids
from (select user_id, count(*) as bid_count
from bid group by user_id) as s
join user u on u.id = s.user_id
Count all Bids for a User:
Get average final price of all Auctions:
Get the User with the most Bids:
14. Adapt to your Data Store
Model
• Most web app developers think in terms of tables, columns, queries
• Many times the schema is simply mirrored in the application layer model objects
• (Not a bad thing, but hard to change)
• The most successful/scalable applications embrace the features and limitations of their
chosen datastore
Schema DAO Application
Patterns defined here effect application
behavior for data interaction
Model
Access PatternStorage Details
Model
15. Encouraging Scalable Access Patterns
public class BidDao {
// Common API structure, loads all in memory
// Also requires that the full User object is available
public List<Bid> getBids(User user) {…}
...
}
public class BidDao {
// Paging is a good option to avoid memory issues
public List<Bid> getBids(String userId, int offset, int limit) {…}
// Streaming APIs encourages streaming processing
public Iterator<Bid> getBids(String userId) {…}
...
}
Common:
Alternative:
16. Encouraging Scalable Access Patterns
DAO
DAO
Common:
Streaming:
Small buffer
Memory Required
DAO
Paging: Memory Required
…
Garbage Collected
…
Memory Required
17. Adapt to your Data Store
Application
SQL-NoSQL Adapter
DAO DAO DAO
Danger!!
If you mask your true
datastore semantics,
you risk your
scalability
• DataNucleus is a good option if used with discipline
• Provides JDO/JPA support
NoSQL Store
18. Top level concepts to embrace
• Denormalization
• Intelligent Key Design
• Counters
• Sharding
20. Identify Conceptually Immutable Fields
public class User {
private long id;
private String email;
private String name;
private String profileImageUrl;
// AccessLevel is an enum
private AccessLevel accessLevel;
private Date createdDate;
private List<Auction> auctions;
private List<Bid> bids;
...
}
public class Auction {
private long id;
private String title;
private String imageUrl;
private BigDecimal currentPrice;
private User highBidder;
private Date endTime;
private List<Bid> bids;
private Payment payment;
...
}
public class UserReference {
private long id;
private String name;
private String profileImageUrl;
...
}
public class AuctionReference {
private long id;
private String title;
private String imageUrl;
...
}
21. Modified Data Structures
public class User {
// Changed ids to Strings
// (more on that soon)
private String id;
private String email;
private String name;
private String profileImageUrl;
private AccessLevel accessLevel;
private Date createdDate;
private List<Auction> auctions;
private List<Bid> bids;
...
}
public class Auction {
private String id;
private String title;
private String imageUrl;
private BigDecimal currentPrice;
private UserReference highBidder;
private Date endTime;
private List<Bid> bids;
private Payment payment;
...
}
public class Bid {
private String id;
private UserReference user;
private AuctionReference auction;
private BigDecimal amount;
private Date timestamp;
...
}
public class Payment {
private String id;
private AuctionReference auction;
private Date timestamp;
// Visa, MasterCard, AmEx etc.
private String cardType;
private String confirmationNumber;
...
}
22. Modified Data Models
public class Bid {
// the @Embedded annotation (both JDO and JPA)
// indicates that this is not an FK relationship:
@Embedded
private UserReference user;
@Embedded
private AuctionReference auction;
...
}
…/d288-4af3-8821-27a37269ec0c {amount:”14.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …}
…/d288-4af3-8821-27a37283af10 {amount:”240.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …}
Bid
id
user_id
user_name
user_profile_image
amount
timestamp
auction_title
…
Under the hood in the data store:
• JDO/JPA configuration is certainly not required
• We’re making a copy of the conceptually immutable properties of the user
• When we read a Bid record now, we don’t need to go fetch the User record
• Nor do we need a join
23. Manual Marshaling
public class BidDao {
public Bid read(String id) {
// This is an HBase-like API, but the idea is the same for most all
// NoSQL datastore native APIs:
Result result = openConnection().get(“bid”, id);
Bid bid = new Bid();
bid.setId(result.getValue(“id”));
...
String userId = result.getValue(“user_id”);
String userName = result.getValue(“user_name”);
String profileUrl = result.getValue(“user_profile_image”);
UserReference user = new UserReference(userId, userName, profileUrl);
bid.setUser(user);
...
return bid;
}
...
}
// To access user information:
UserReference user = bid.getUser();
String userName = user.getName();
24. We support access pattern without joins
auction_title
auction_title
auction_title
auction_title
auction_image
.somg
Bid
id
user_id
user_name
user_profile_image
amount
timestamp
auction_id
auction_title
auction_image_url
Click on Auction
image or name
and go to details
for Auction
25. Data is duplicated many (many) times
Bid
id amount user_id user_name user_profile_image auction_id auction_title . . .
124 14.00 5432 Gustavo ‘Gus’ Fring http://nj.boss.com… 555111222 Barrel Methylamine . . .
125 13.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .
126 12.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . .
127 11.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .
128 10.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . .
129 9.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . .
130 8.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .
131 7.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . .
132 6.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .
User
id name profile_image email created_date . . .
5432 Gustavo ‘Gus’ Fring http://nj.boss.com… tony@breakingbad.com 2008-01-01 . . .
1234 Walter White http://chem.users… walter@breakingbad.com 2008-02-02 . . .
2223 Hank Schrader http://dea.bro.com… hank@breakingbad.com 2009-01-12 . . .
1112 Jesse Pinkman http://facebook.com… jessie@breakingbad.com 2008-11-16 . . .
26. What about updates?
Backend
Node(s)
Async Request to
change all Bid
records related to
this user
Name
Change
Request
Edge
Node
Time Line
NoSQL
Response
sent to user
Use workers to
modify affected
records
Possibly minutes
27. Denormalization Observations
• We don’t always need ACID compliance
• Strict FK enforcement not always required
• MySQL’s MyISAM storage works fine for many situations
• Users are getting used to change latency
• There is a trade off between horizontal scalability in your app
and patterns we’ve been trained to rely on
31. Ascending Timestamp
Bid/2014-10-26T09:00:00.000 {…}
Bid/2014-10-26T09:00:12.975 {…}
Bid/2014-10-26T09:00:14.221 {…}
Bid/2014-10-26T09:00:18.005 {…}
Bid/2014-10-26T09:00:35.572 {…}
Bid/2014-10-26T09:00:40.003 {…}
Bid/2014-10-26T09:00:41.123 {…}
Bid/2014-10-26T09:00:41.124 {…}
Bid/2014-10-26T09:00:41.150 {…}
Bid/2014-10-26T09:00:41.218 {…}
yyyy-MM-ddTHH:mm:ss.SSS
is a pretty standard timestamp and lexically orders chronologically
• Great for time-series data
• Timeline tracking (viewing data in the order it was processed etc.)
OlderNewer
34. Descending Timestamp
Bid/9223370622642200431 {…}
Bid/9223370622642200478 {…}
Bid/9223370622642200512 {…}
Bid/9223370622642203021 {…}
Bid/9223370622642203897 {…}
Bid/9223370622642204112 {…}
Bid/9223370622642204559 {…}
Bid/9223370622642207054 {…}
Bid/9223370622642215431 {…}
Bid/9223370622642235500 {…}
public class User {
// This will yield some ridiculous value like: 9223370622642200431
// Number of millseconds in a year: 3153600000
// This computation will reach 0 in the year 292,471,163
long descendingTimestamp = Long.MAX_VALUE – System.currentTimeMillis();
}
NewerOlder
35. Descending Timestamp
Bid/9223370622642200431 {… action_id:”12345” …}
Bid/9223370622642200478 {… action_id:”54321” …}
Bid/9223370622642200512 {… action_id:”12345” …}
Bid/9223370622642203021 {… action_id:”22222” …}
Bid/9223370622642203897 {… action_id:”22233” …}
Bid/9223370622642204112 {… action_id:”12345” …}
Bid/9223370622642204559 {… action_id:”22233” …}
Bid/9223370622642207054 {… action_id:”54321” …}
Bid/9223370622642215431 {… action_id:”54321” …}
Bid/9223370622642235500 {… action_id:”12345” …}
1
2
3
4
5
Start with ”Bid/”
Stop after 5 rows
5 most recent bids
• Known as a “range scan”
• Very easy to start with some prefix and read for N records
• Complexity stays constant for top 5 bids no matter how many bids are in the system
36. Descending Timestamp
Auction/11222/Bid/9223370622642203021 {… action_id:”11222” …}
Auction/12233/Bid/9223370622642203897 {… action_id:”12233” …}
Auction/12233/Bid/9223370622642204559 {… action_id:”12233” …}
Auction/12345/Bid/9223370622642200431 {… action_id:”12345” …}
Auction/12345/Bid/9223370622642200512 {… action_id:”12345” …}
Auction/12345/Bid/9223370622642204112 {… action_id:”12345” …}
Auction/12345/Bid/9223370622642235500 {… action_id:”12345” …}
Auction/54321/Bid/9223370622642200478 {… action_id:”54321” …}
Auction/54321/Bid/9223370622642207054 {… action_id:”54321” …}
Auction/54321/Bid/9223370622642215431 {… action_id:”54321” …}
1
2
3
4
Start with ”Auction/12345”
Stop after 4 rows
4 most recent bids
“Bid/9223370622642200431”“Auction/12345”
• Now, all Bids for each Auction are located right next to each other
• This matches our most used access pattern
• We now have information about related data just from the key
• Key-only queries can be used to help speed up apps
• Why 4 Bids instead of 5? My example only had 4 records
(or until row “Auction/12346”)
37. Linking Related Data With Intelligent Keys
1234
12341234
Bid
Auction/11222/... {…}
Auction/12233/... {…}
Auction/12233/... {…}
Auction/12345/... {…}
Auction/12345/... {…}
Auction/12345/... {…}
Auction/12345/... {…}
Auction/54321/... {…}
Auction/54321/... {…}
Auction/54321/... {…}
Auction
11222 {…}
12233 {…}
12345 {…}
54321 {…}
http://myapp.com/api/auctions/12345
datastore.get(”12345”);
datastore.rangeScan(”Auction/12345/”, 5);
Both reads can be done
in parallel
38. Linking Related Data With Intelligent Keys
1234
12341234
AuctionData
Auction/11222/Bid/987321... {…}
Auction/12233/Bid/987534... {…}
Auction/12233/Bid/987635... {…}
Auction/12345 {…, ..., ...}
Auction/12345/Bid/977534... {…}
Auction/12345/Bid/987501... {…}
Auction/12345/Bid/987687... {…}
Auction/12345/Bid/988012... {…}
Auction/54321 {…, ..., ...}
Auction/54321/... {…}
Auction/54321/... {…}
datastore.rangeScan(”Auction/12345”, 6);
Data of completely different
schemas / types can be written to
the same table co-located on disk
http://myapp.com/api/auctions/12345
40. Counters
public void placeBid(String userId, String auctionId) {
// Many NoSQL stores support a native counter via some increment-and-get
// After the counter has been incremented, we don’t need to worry about contention
long bidCount = datastore.incrementAndGet(auctionId + ”_counter”);
BigDecimal amount = bidCount * BID_INCREMENT;
long descendingTimestamp = Long.MAX_VALUE - System.currentTimeMillis();
String bidId = ”Auction/” + auctionId + ”/Bid/” + reverseTimestamp + ”/” + amount;
// Increment some helper counters...
datastore.incrementAndGet(”global_bidCounter”);
datastore.incrementAndGet(auctionId + ”_bidCounter”);
datastore.incrementAndGet(userId + ”_bidCounter”);
// ... other logic like creating the Bid object ...
bidDao.write(bidId, bid);
}
// Some datastores may have a first-order Counter object:
Counter bidCounter = datastore.getCounter(auctionId + ”_counter”);
long bidCount = counter.incrementAndGet();
44. Data Model Sharding
public class Auction {
private String id;
private String title;
private String imageUrl;
private String description;
private BigDecimal currentPrice;
private User highBidder;
private Date endTime;
...
}
public class AuctionState {
private String id;
private BigDecimal currentPrice;
private User highBidder;
private Date endTime;
...
}
• Separate frequently changing data from static data
• Allows caching of static data
• Makes reads/writes of changing data faster
• Separate values expensive to serialize but in-frequently read
46. 1234
1234
AuctionData
Auction/11222/Bid/987321... {…}
Auction/12233/Bid/987534... {…}
Auction/12233/Bid/987635... {…}
Auction/12345 {…, ..., ...}
Auction/12345/AuctionState {…}
Auction/12345/Bid/977534... {…}
Auction/12345/Bid/987501... {…}
Auction/54321 {…, ..., ...}
Auction/54321/... {…}
More Parallel Reads
12341234http://myapp.com/api/auctions/12345
datastore.get(”Auction/12345/AuctionState”);
datastore.get(”Auction/12345”);
Again, records can be in the
same table
Memcache Check
Cache
1 4
47. Sharding a 64 bit Integer
long count = datastore.incrementAndGet(”global_bidCounter”);
176
52 84 40+ + = 176
global_bidCounter
52 84 41 177+ + =
53 84 40 177+ + =
52 85 40 177+ + =
• Decompose the counter
• Pick any part of the count and increment it
48. Implementing a Sharded Counter
public class ShardedCounter {
// the @Embedded annotation (both JDO and JPA)
// indicates that this is not an FK relationship:
private String name;
private int shards;
private void increment() {
int index = random(shards);
datastore.incrementAndGet(name + ”-” + index);
}
private long get() {
long count = 0;
// All the shards of the counter are located next to each other:
Result scan = datastore.rangeScan(name + ”-”, shards);
while (scan.hasNext()) {
Counter next = scan.next();
count += next.get();
}
return count;
}
}
49. We Love Feedback
Questions/Comments
Email: bryce.cottam@thinkbiganalytics.com
Rate This Session
with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
Follow Teradata 2015 PARTNERS
www.teradata-partners.com/social