Database management systems first appeared in the 1960s as computers grew more powerful. Navigational databases in the 1960s allowed only sequential processing. Edgar Codd suggested the relational model in the 1970s, allowing users to search databases through integration of navigational and other models. A relational database organizes data in tables based on relations and uses SQL for querying.
History of NoSQL and Azure Documentdb feature setSoner Altin
Short history of database systems from DBMS, RDBMS to NoSQL solutions. Introduction to SQL query support of Azure DocumentDB and integrating DocumentDB with simple Java application from Maven repository.
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
History of NoSQL and Azure Documentdb feature setSoner Altin
Short history of database systems from DBMS, RDBMS to NoSQL solutions. Introduction to SQL query support of Azure DocumentDB and integrating DocumentDB with simple Java application from Maven repository.
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
Facts About Big Data, How it is stored . How Big Data is being Proceed And What is the tools and Techniques which is used for handling BigData. All are coverd in these Slides
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
At the Technology Trends seminar, with HCMC University of Polytechnics' lecturers, KMS Technology's CTO delivered a topic of Big Data, Cloud Computing, Mobile, Social Media and In-memory Computing.
Facts About Big Data, How it is stored . How Big Data is being Proceed And What is the tools and Techniques which is used for handling BigData. All are coverd in these Slides
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
At the Technology Trends seminar, with HCMC University of Polytechnics' lecturers, KMS Technology's CTO delivered a topic of Big Data, Cloud Computing, Mobile, Social Media and In-memory Computing.
This deck talks about the basic overview of NoSQL technologies, implementation vendors/products, case studies, and some of the core implementation algorithms. The presentation also describes a quick overview of "Polyglot Persistency", "NewSQL" like emerging trends.
The deck is targeted to beginners who wants to get an overview of NoSQL databases.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
to effectively analyze this kind of information is now seen as a key competitive advantage to better inform decisions. In order to do so, organizations employ Sentiment Analysis (SA) techniques on these data. However, the usage of social media around the world is ever-increasing, which considerably accelerates massive data generation and makes traditional SA systems unable to deliver useful insights. Such volume of data can be efficiently analyzed using the combination of SA techniques and Big Data technologies. In fact, big data is not a luxury but an essential necessary to make valuable predictions. However, there are some challenges associated with big data such as quality that could highly affect the SA systems’ accuracy that use huge volume of data. Thus, the quality aspect should be addressed in order to build reliable and credible systems. For this, the goal of our research work is to consider Big Data Quality Metrics (BDQM) in SA that rely of big data. In this paper, we first highlight the most eloquent BDQM that should be considered throughout the Big Data Value Chain (BDVC) in any big data project. Then, we measure the impact of BDQM on a novel SA method accuracy in a real case study by giving simulation results.
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
Vikram Andem, Senior Manager, United Airlines, A case for Bigdata Program and Strategy @ IATA Technology Roadmap 2014, October 13th, 2014, Montréal, Canada
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
Big data doesn't mean big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What's more, choosing the correct technology for your use case will almost certainly increase your top line as well.
Big words, right? We'll back them up with customer case studies and lots of details.
This webinar will give you the basics for growing your business in a profitable way. What's the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We'll take you through the specific use cases and business models that are the best fit for NoSQL solutions.
By the way, no prior knowledge is required. If you don't even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers' needs in today's data environment.
Similar to Mongo Internal Training session by Soner Altin (20)
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
2. @kahve
• Soner ALTIN
• BizDev @T2
• soner.in
• Strong interest in Led Zeppelin
• soneraltin@me.com /
soner.altin@t2.com.tr
3. HISTORY OF DBMS AND RDBMS
Database management systems first appeared on the scene in 1960 as
computers began to grow in power and speed. In the middle of 1960, there
were several commercial applications in the market that were capable of
producing “navigational” databases. These navigational databases
maintained records that could only be processed sequentially, which
required a lot of computer resources and time.
Relational database management systems were first suggested by
Edgar Codd in the 1970s. Because navigational databases could not
be “searched”, Edgar Codd suggested another model that could be
followed to construct a database. This was the relational model that
allowed users to “search” it for data. It included the integration of the
navigational model, along with a tabular and hierarchical model.
60’s 70’s 80’s 90’s 00’s
4. A relational database is a
digital database whose
organization is based on the
relational model of data
5. RDMBS 40 YEARS!
1. A simple way of representing data/ business models
2. An easy-to-use language to retrieve and query that data
(SQL)
3. Bulletproof data integrity and security built right into the
database without having to rely on application rules and
logic.
6. ACCESS AND STORAGE
▸ It is generally easier to access data that is stored in a relational
database. This is because the data in a relational database follows
a mathematical model for categorization. Also, once we open a
relational database, each and every element of that database
becomes accessible, which is not always the case with a normal
database (the data elements may need to be accessed
individually).
▸ Relational databases are harder to construct, but they are better
structured and more secure. They follow the ACID (atomicity,
consistency, isolation and durability) model when storing data.
The relational database system will also impose certain
regulations and conditions that may not allow you to manipulate
data in a way that destabilizes the integrity of the system.
8. 3V - VOLUME VARIETY VELOCITY
▸ Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google
discovered that a half-second increase in search latency dropped traffic by 20%.
▸ The volume of required data handling today is skyrocketing. Facebook houses 1.5 PB (Peta Bytes)
of uploaded photos. Google processes 20PB of data each day. Every 60 seconds over 204 million
emails are exchanged, 3,600 photos are shared on Instagram and 2 million search queries are
processed by Google. RDBMSs struggle in the face of such huge data volumes and RDBMS
solutions capable of handling such volumes are extremely expensive.
▸ Big Data also demands collection of an extremely wide variety of data types, but RDBMSs have
inflexible schemas. The problem is that Big Data primarily comprises semi-structured data, such as
social media sentiment analysis and text mining data, while RDBMSs are more suitable for
structured data, such as weblog, sensor and financial data.
▸ In addition, Big Data is accumulated at a very high velocity. Since RDBMSs are designed for steady
data retention, rather than for rapid growth, using RDBMSs for Big Data is prohibitively expensive.
60’s 70’s 80’s 90’s 00’s 10’s
9. TODAY
▸ Developers are working with applications that
create massive volumes of new, rapidly changing
data types — structured, semi-structured,
unstructured and polymorphic data.
▸ Long gone is the twelve-to-eighteen month
waterfall development cycle. Now small teams
work in agile sprints, iterating quickly and
pushing code every week or two, some even
multiple times every day.
▸ Applications that once served a finite audience
are now delivered as services that must be
always-on, accessible from many different
devices and scaled globally to millions of users.
▸ Organizations are now turning to scale-out
architectures using open source software,
commodity servers and cloud computing instead
of large monolithic servers and storage
infrastructure.
10. Structured Unstructured Semi-structured
Pre-defined God knows Pre-defined
Relational Non-relational So so
Constant Flexible Easy to change
RDBMS HDFS *
CRM, Travel, Phone
numbers
Web, Video, Music, Photo Tagging, Comments
%5 %15 %80
No need to scale
horizontally
Fully scalable Fully scalable
11. /*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing one single status of a user.
*
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface Status extends Comparable<Status>, TwitterResponse,
EntitySupport, java.io.Serializable {
Date getCreatedAt();
long getId();
String getText();
String getSource();
boolean isTruncated();
long getInReplyToStatusId();
long getInReplyToUserId();
String getInReplyToScreenName();
GeoLocation getGeoLocation();
Place getPlace();
boolean isFavorited();
boolean isRetweeted();
int getFavoriteCount();
User getUser();
boolean isRetweet();
Status getRetweetedStatus();
long[] getContributors();
int getRetweetCount();
boolean isRetweetedByMe();
long getCurrentUserRetweetId();
boolean isPossiblySensitive();
String getLang();
Scopes getScopes();
String[] getWithheldInCountries();
long getQuotedStatusId();
Status getQuotedStatus();
}
/*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing Basic user information element
*
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface User extends Comparable<User>, TwitterResponse, java.io.Seria
long getId();
String getName();
String getScreenName();
String getLocation();
String getDescription();
boolean isContributorsEnabled();
String getProfileImageURL();
String getBiggerProfileImageURL();
String getMiniProfileImageURL();
String getOriginalProfileImageURL();
String getProfileImageURLHttps();
String getBiggerProfileImageURLHttps();
String getMiniProfileImageURLHttps();
String getOriginalProfileImageURLHttps();
boolean isDefaultProfileImage();
String getURL();
boolean isProtected();
int getFollowersCount();
Status getStatus();
String getProfileBackgroundColor();
String getProfileTextColor();
String getProfileLinkColor();
String getProfileSidebarFillColor();
String getProfileSidebarBorderColor();
boolean isProfileUseBackgroundImage();
boolean isDefaultProfile();
boolean isShowAllInlineMedia();
int getFriendsCount();
Date getCreatedAt();
int getFavouritesCount();
int getUtcOffset();
String getTimeZone();
String getProfileBackgroundImageURL();
String getProfileBackgroundImageUrlHttps();
String getProfileBannerURL();
String getProfileBannerRetinaURL();
String getProfileBannerIPadURL();
String getProfileBannerIPadRetinaURL();
String getProfileBannerMobileURL();
String getProfileBannerMobileRetinaURL();
boolean isProfileBackgroundTiled();
String getLang();
int getStatusesCount();
boolean isGeoEnabled();
boolean isVerified();
boolean isTranslator();
int getListedCount();
boolean isFollowRequestSent();
URLEntity[] getDescriptionURLEntities();
URLEntity getURLEntity();
String[] getWithheldInCountries();
}}
12. /*
* Copyright 2007 Yusuke Yamamoto
*/
/**
* A data interface representing one single URL entity.
* @author Mocel - mocel at guma.jp
*/
public interface URLEntity extends TweetEntity, java.io.Serializable {
String getText();
String getURL();
String getExpandedURL();
String getDisplayURL();
int getStart();
int getEnd();
}
/**
* @author Yusuke Yamamoto - yusuke at mac.com
*/
public interface Place extends TwitterResponse, Comparable<Place>,
java.io.Serializable {
String getName();
String getStreetAddress();
String getCountryCode();
String getId();
String getCountry();
String getPlaceType();
String getURL();
String getFullName();
String getBoundingBoxType();
GeoLocation[][] getBoundingBoxCoordinates();
String getGeometryType();
GeoLocation[][] getGeometryCoordinates();
Place[] getContainedWithIn();
}
https://dev.twitter.com/rest/reference/get/statuses/retweets_of_me
14. NON
RELATIONAL
Provides a mechanism for
storage and retrieval of
data which is modeled in
means other than the
tabular relations used in
relational databases
15. NOSQL
MONGODB
▸ NoSQL Document based database.
▸ Designed to build todays applications.
▸ Fast to build.
▸ Quick to adapt.
▸ Easy to scale
▸ Lessons learned from 40 years of RDBMS.
16. REQUIREMENTS
▸ over 425 million unique users
▸ store 20 TB of JSON document
data
▸ available globally to serve all
markets
▸ store for 40+ apps / device
combinations
▸ under 15 ms writes and single
digits ms reads
19. ECONOMICS
The goal of a business, of course, is to make
money, and that’s accomplished by
providing more for less. NoSQL databases
drastically reduce the need for insanely big
machines. Typically, they use clusters of
cheap commodity servers to manage
exploding data and transaction volumes. The
cost-per-gigabyte or transaction/second for
NoSQL can be considerably lower than the
cost for RDBMSs, thereby dramatically
reducing the cost of data processing and
storage. Another area of key savings is in
manpower. By lowering administrative costs
one can free up developers to code new
features that will generate more revenue.
21. SCHEMALESS - DATA UPDATE
The documents stored in the database can
have varying sets of fields, with different
types for each field. One could have the
following objects in a single collection:
{ name : “Joe”, x : 3.3, y : [1,2,3] }
{ name : “Kate”, x : “abc” }
{ q : 456 }
Of course, when using the database for real
problems, the data does have a fairly
consistent structure. Something like the
following would be more common:
{ name : “Joe”, age : 30, interests : ‘football’ }
{ name : “Kate”, age : 25 }
One of the great benefits of dynamic objects is
that schema migrations become very easy.
With a traditional RDBMS, releases of code
might contain data migration scripts. Further,
each release should have a reverse migration
script in case a rollback is necessary. ALTER
TABLE operations can be very slow and result
in scheduled downtime.
With a schemaless database, 90% of the time
adjustments to the database become
transparent and automatic. For example, if we
wish to add GPA to the student objects, we add
the attribute, resave, and all is well – if we look
up an existing student and reference GPA, we
just get back null. Further, if we roll back our
code, the new GPA fields in the existing objects
are unlikely to cause problems if our code was
well written.
22. NOSQL
data model performance scalability flexibility complexity
column high high moderate low
document high variable high low
key-value high high high none
graph variable variable high high
23.
24.
25. MONGOLDB
KEY FEATURES
▸ Open source
▸ Document database
▸ High performance
▸ Rich query language
▸ High availability
▸ Horizontal scalability
▸ Support for multiple storage engine
26. MONGOLDB
OPEN SOURCE
▸ Wikipedia: The software company 10gen began
developing MongoDB in 2007 as a component of a
planned platform as a service product. In 2009, the
company shifted to an open source development model,
with the company offering commercial support and other
services. In 2013, 10gen changed its name to MongoDB
Inc.
27. MONGOLDB
DOCUMENT DATABASE
▸ A record in MongoDB is a document, which is a data structure composed of
field and value pairs. MongoDB documents are similar to JSON objects. The
values of fields may include other documents, arrays, and arrays of documents.
▸ Documents (i.e. objects) correspond to native data types in many
programming languages.
▸ Embedded documents and arrays reduce need for expensive joins.
▸ Dynamic schema supports fluent polymorphism.
{
name: “Soner”,
age: 31,7,
company: “T2”,
country: “Turkey”,
city: “Istanbul”,
pets: [{name: “one”, alive: false}, {name: “two”, age: 3, alive: true}, {alive: false}]
}
28. MONGOLDB
HIGH PERFORMANCE
▸ MongoDB provides high performance data persistence. In
particular:
▸ Support for embedded data models reduces I/O activity
on database system.
▸ Indexes support faster queries and can include keys
from embedded documents and arrays.
http://info-mongodb-com.s3.amazonaws.com/High%2BPerformance%2BBenchmark%2BWhite%2BPaper_final.pdf
29. MONGOLDB
RICH QUERY LANGUAGE
▸ MongoDB supports a rich query language to support read and
write operations as well as:
▸ data aggregation
▸ Text Search and Geospatial Queries.
▸ https://docs.mongodb.com/manual/crud/
▸ https://docs.mongodb.com/manual/core/aggregation-pipeline/
▸ https://docs.mongodb.com/manual/reference/operator/query/text/#op._S_text
▸ https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/
30. MONGOLDB
HIGH AVAILABILITY
▸ MongoDB’s replication
facility, called replica set,
provides:
▸ automatic failover and
▸ data redundancy.
▸ A replica set is a group of
MongoDB servers that
maintain the same data set,
providing redundancy and
increasing data availability.
31. MONGOLDB
HORIZONTAL SCALABILITY
▸ MongoDB provides horizontal
scalability as part of its core
functionality:
▸ Sharding distributes data across a
cluster of machines.
▸ Tag aware sharding allows for
directing data to specific shards,
such as to take into consideration
geographic distribution of the
shards.
32. MONGOLDB
STORAGE ENGINE
▸ MongoDB supports multiple storage engines, such as:
▸ WiredTiger Storage Engine and
▸ MMAPv1 Storage Engine.
▸ In addition, MongoDB provides pluggable storage engine API that allows
third parties to develop storage engines for MongoDB.
33. MONGOLDB
WHERE SHOULD USE MONGODB?
▸ Big Data
▸ Content Management and Delivery
▸ Mobile and Social Infrastructure
▸ User Data Management
▸ Data Hub
35. DBA
WHAT’S A DBA?
▸ Wikipedia: Database administrators (DBAs) use specialized
software to store and organize data. The role may include
capacity planning, installation, configuration, database design,
migration, performance monitoring, security, troubleshooting, as
well as backup and data recovery.
▸ Who’s the DBA of T2?
▸ Assume that you are the technical leader of a startup and no
money, who will be the DBA?
http://www.techrepublic.com/blog/the-enterprise-cloud/what-does-a-dba-do-all-day/
37. DBA
DATABASE / COLLECTION / DOCUMENT
▸ Database is a physical container for collections. Each database gets its own
set of files on the file system. A single MongoDB server typically has multiple
databases.
▸ Collection is a group of MongoDB documents. It is the equivalent of an
RDBMS table. A collection exists within a single database. Collections do not
enforce a schema. Documents within a collection can have different fields.
Typically, all documents in a collection are of similar or related purpose.
▸ Document is a set of key-value pairs. Documents have dynamic schema.
Dynamic schema means that documents in the same collection do not need
to have the same set of fields or structure, and common fields in a
collection's documents may hold different types of data.
38. DBA
DATA TYPES
▸ String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.
▸ Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
▸ Boolean : This type is used to store a boolean (true/ false) value.
▸ Double : This type is used to store floating point values.
▸ Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.
▸ Arrays : This type is used to store arrays or list or multiple values into one key.
▸ Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.
▸ Object : This datatype is used for embedded documents.
▸ Null : This type is used to store a Null value.
▸ Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific
symbol type.
▸ Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by
creating object of Date and passing day, month, year into it.
▸ Object ID : This datatype is used to store the document’s ID.
▸ Binary data : This datatype is used to store binay data.
▸ Code : This datatype is used to store javascript code into document.
▸ Regular expression : This datatype is used to store regular expression
39. DBA
OBJECTID
ObjectId(<hexadecimal>)
Returns a new ObjectId value. The 12-byte ObjectId value consists of:
‣ a 4-byte value representing the seconds since the Unix epoch,
‣ a 3-byte machine identifier,
‣ a 2-byte process id, and
‣ a 3-byte counter, starting with a random value.
{ "_id" : ObjectId("574d70b59f1cd9f2254ae00e"), "hello" : "papa" }
43. DBA
DATABASE AND COLLECTION COMMANDS
▸ show dbs: lists available databases
▸ db.dropDatabase(): drops current database
▸ db.createCollection(“your_collection”): creates collection
with name your_collection in database
▸ show collections: lists available collection in database
▸ db.”your_collection”.drop(): drops collection
44. DBA
INSERT / SAVE
▸ db.”your_collection”.insert([your_documents])
▸ db.”your_collection”.save([your_documents]): if id presents
and in database, works as update
▸ db.t2mongo.insert(http://www.json-generator.com/)
▸ db.t2mongo.count()
45. DBA
QUERY
Equality {<key>:<value>} db.mycol.find({"by":"tutorials
point"}).pretty()
Less Than {<key>:{$lt:<value>}} db.mycol.find({"likes":{$lt:
50}}).pretty()
Less Than Equals {<key>:{$lte:<value>}} db.mycol.find({"likes":{$lte:
50}}).pretty()
Greater Than {<key>:{$gt:<value>}} db.mycol.find({"likes":{$gt:
50}}).pretty()
Greater Than Equals {<key>:{$gte:<value>}} db.mycol.find({"likes":{$gte:
50}}).pretty()
Not Equals {<key>:{$ne:<value>}} db.mycol.find({"likes":{$ne:
50}}).pretty()
db.t2mongo.find({age : { $gt : 30} })
db.t2mongo.find({age : { $gt : 30} }).pretty()
db.t2mongo.count({age : { $gt : 30} })
54. DBA
REPLICATION
▸ A replica set in MongoDB is a group of
mongod processes that maintain the
same data set. Replica sets provide
redundancy and high availability, and are
the basis for all production deployments.
▸ A cluster of N nodess
▸ Anyone node can be primary
▸ All write operations goes to primary
▸ Automatic failover
▸ Automatic Recovery
▸ Consensus election of primary
55. DBA
REPLICATION
▸ Create folder for each mongod operations
▸ mkdir a; mkdir b; mkdir c
▸ Run three mongod processes
▸ mongod --dbpath a --replSet myReplica
▸ mongod --dbpath b --replSet myReplica —port 27018
▸ mongod --dbpath c --replSet myReplica —port 27019
▸ Connect to any mongo
▸ mongo
▸ Now we connected mongo default port 27017, let’s initiate replica set
▸ rs.initiate()
▸ Let’s add other servers: rs.add("hostname:port")
▸ rs.add(“Ground-Control.local:27018”); rs.add("Ground-Control.local:27019")
56. DBA
REPLICATION
▸ You’ll see PRIMARY on port 27017 mongo shell
▸ Connect to other ports as below and see Secondary on shell
▸ mongo —port 27018
▸ mongo —port 27019
▸ Let’s insert some documents on port 27017 and try to read
on other ports
▸ Use rs.slaveOk() to read on slaves
▸ Use rs.help() to see replication commands
57. DBA
REPLICATION
▸ Run below command on port 27017
▸ use adb; for (var i = 0; i < 50000; i++)
{ db.test.insert({_id : i, x})}
▸ Run command use adb;db.test.count() on other ports
and observe replication works
▸ Shutdown PRIMARY and observe new PRIMARY on shell
and also run rs.status() to see new config
58. DBA
SHARDING CLUSTER
▸ Sharding is the process of storing data records
across multiple machines and is MongoDB’s
approach to meeting the demands of data growth.
As the size of the data increases, a single machine
may not be sufficient to store the data nor provide
an acceptable read and write throughput. Sharding
solves the problem with horizontal scaling.
▸ In replication all writes go to master node
▸ Latency sensitive queries still go to master
▸ Single replica set has limitation of 12 nodes
▸ Memory can't be large enough when active
dataset is big
▸ Local Disk is not big enough
▸ Vertical scaling is too expensive
59. DBA
SHARDING CLUSTER
▸ Shards: Shards are used to store data. They provide high availability and data
consistency. In production environment each shard is a separate replica set.
▸ Config Servers: Config servers store the cluster's metadata. This data contains
a mapping of the cluster's data set to the shards. The query router uses this
metadata to target operations to specific shards. In production environment
sharded clusters have exactly 3 config servers.
▸ Query Routers: Query Routers are basically mongos instances, interface with
client applications and direct operations to the appropriate shard. The query
router processes and targets operations to shards and then returns results to
the clients. A sharded cluster can contain more than one query router to
divide the client request load. A client sends requests to one query router.
Generally a sharded cluster have many query routers.
62. DBA
SHARDING CLUSTER
▸ Introduce shards to query routers
▸ On mongo shell run command:
sh.addShard(“hostname:port”)
▸ sh.addShard(“Ground-Control.local:27118”);
▸ sh.addShard(“Ground-Control.local:27218”);
▸ sh.addShard(“Ground-Control.local:27318”);
63. DBA
SHARDING COLLECTION
▸ Run below command on port 27017
▸ use catalog;
▸ for (var i = 0; i < 1000000; i++) { db.movies.insert({name
: 'name ' + i, type: 'type ' + i, gross : i, country : 'country '
+ i, date : ISODate(), value : Math.random() * 1000000})}
▸ sh.enableSharding(“catalog")
▸ sh.status()
▸ sh.shardCollection("catalog.movies", {_id : 1}, true)