SlideShare a Scribd company logo
1 of 57
NoSQL: Now and Path Ahead
Shubham Kumar Srivastava
MakeMyTrip
Who am I
Abstract


What and Why : NoSql

Fundamentals

Use Case

Challenges

Path Ahead
                        3


.
What is NoSql
Database which does not adhere to the traditional relational database
management system (RDMS) structure .
Why NoSql


 Scalability and Performance


 Cost


 Data Modeling
Why NoSql : Motives and Drivers
                    Scalability and Performance

 Horizontal scalability better than Vertical




 Hardware getting cheaper and processing power increasing




 Less Operational complexity as against RDBMS solutions.




 In most of the solutions you get automatic sharding etc as default .
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..
                     Cost


 Scale(as with NoSql) with Hefty Cost




 Commodity hardware, software versions, upgrades,
  maintenance.



 This brought organizations look out for alternatives and
  the need for a cost effective scale out option.
Why NoSql : Motives and Drivers contd..
                       Data Modeling
SQL has been for



 Concurreny,Consistency,Integrity




 For Summations,Aggregations,Grouping’s




 Schema Says: What all Do I answer ??
Why NoSql : Motives and Drivers contd..
                          Data Modeling

 A plain key-value store is very powerful and fit the max use cases for
  a NoSQL solution


 Hierarchical or graph-like data modelling and processing.


 Values like maps of maps of maps.


 Document Databases which even store arbitrary complex objects.


 Document based indexing data store’s are a huge success.
Why NoSql : Motives and Drivers contd..
At times SW apps are not limited to these constraints . This lead to
data models like


Key/Value Store :
   Redis,MemcacheDb/Voldemort etc.

Wide Column Store / Column Families :
Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc.


Document Based Store’s :
   Solr/Lucene/MongoDb/CouchDb/TerraStore etc.


Graph Data Store :
   Neo4J/GraphBase/FlockDb etc.
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..

   Schema Says: What are the questions


   Data modeling is based on the set of Queries


   Exploit De-normalization Duplication


   Use Aggregates


   Manage Joins with App + Aggregation + DeNormalization etc.
Some Fanda-mentals
                   CAP Theorem

 At the most only two properties of the three in a
 shared/distributed system can be satisfied.

 Consistency

 Availability

 Tolerance to Network Partitions
CAP : Pictorially
Explanation
Use case:
      Scaling Web Apps


Critical fact’s :
• Network outages are common
• Customer shopping carts, email search, social network
  queries—can tolerate stale data


How:
  Compromise on Consistency in-order to remain available vs
  disrupt user service at outages.
Explanation


 Rather than requiring consistency after every transaction, it
  is enough for the database to eventually be in a consistent
  state.




 Brewer’s CAP theorem says you have no choice if you want
  to scale up.
Explanation contd..
Sharp Contrast : High Speed Financial Application

 Highly Transactional

 Consistent

 Automated

 Can’t live with Eventual consistency
ACID vs BASE
                          ACID
 Atomic: Everything in a transaction succeeds or the
  entire transaction is rolled back.

 Consistent: A transaction cannot leave the database in an
  inconsistent state.

 Isolated: Transactions cannot interfere with each other.

 Durable: Completed transactions persist, even when
  servers restart etc.
Some Fanda-mentals cont..
                        BASE
Basic Availability



Soft-state



Eventual consistency
Consistent Hashing
Common way to load balance .

The machine chosen to cache object o will be:

hash(o) mod n
n:total number of machines
Consistent Hashing contd..

      Adding a machine to the cache means
                      hash(o) mod (n + 1)

      Removing a machine to the cache means
                 hash(o) mod (n - 1)


           Result on any above: Disaster 

     Swamped machines with redistribution
Consistent Hashing contd..


 Commonly, a hash function(e.g MD5 hash) will
 map a value into a 128-bit key, 0~2^127-1(or 32 bit
 even as given next) .
Consistent Hashing contd..
Consistent Hashing contd..
         Both Key and Machine hashed with the same function
Consistent Hashing contd..
               Adding a Node
Consistent Hashing contd..
               Removing a Node
Use Case and NoSQL Solution
Problem:

 Need to store bookings per day of all hotels .
 Queries centered around city and regions.

              Hotel count : 1 Million

      Date Range : Now to next 365 *2 Days
NoSQL: Path Ahead

 ACID equivalence(Neo4J,CouchDb etc)

 Transaction Support

 Atomicity

 MVCC
NoSQL: Path Ahead contd..
Possible Solution



Work with SQL Db w.r.t Creation/Updation etc.



Archive the data in NoSQL for query/analysis etc.
NoSQL: Path Ahead contd..
Enterprise Adoption and Challenges

 NoSQL looks good for Unstructured data largely

    SQL is the best choice for a broad range of
 traditional workloads.
NoSQL: Path Ahead contd..
NoSQL: Path Ahead contd..
               Shout out loud



                    Hybrid



                 ACID + BASE

 They are not alternatives but supplements
NoSQL: Path Ahead contd..
 Maturity



 Support



 Skillset and Administration/Operation



 Analytics and BI support
NoSQL: Path Ahead contd..
Q&A
References
 Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition-
  tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
 Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
 Brewers CAP theorem on distributed systems", royans.net
 CAP Twelve Years Later: How the "Rules" Have Changed on-line resource
 E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of Distributed
  Computing (PODC 00), ACM, 2000, pp. 7-10; on-line resource
 D. Abadi, "Problems with CAP, and Yahoo’s Little Known NoSQL System," DBMS Musings, blog, 23 Apr.
  2010; on-line resource.
 C. Hale, "You Can’t Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource.
 Facebook: Scaling Out on-line resource.
 Gemstone : The Hardest Problems In Data Management on-line resource
 The Log-Structured Merge-Tree (Research Paper)
 CodeProject : Consistent Hashing on-line resource
 HighlyScalable : NoSQL Data Modeling Techniques on-line resource
 eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource
 John D Cook : Acid Vs Base on-line resource
 Merkle Trees
 Phy-Accural Faliover Detaection (Research Paper)
Backup Slides




         Better than the Original 1 
Document Based DataStore
{
    _id : ObjectId("4e77bb3b8a3e000000004f7a"),
    when : Date("2011-09-19T02:10:11.3Z",
    author : "alex",
    title : "No Free Lunch",
    text : "This is the text of the post. It could be very long.",
    tags : [ "business", "ramblings" ],
    votes : 5,
    voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
    comments : [
        { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
         comment : "I agree." },
        { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
         comment : "You must be joking. etc etc ..." }
    ]
}
User and Items
User and Items : Option 1
User and Items : Option 2
User and Items : Option 3
User and Items : Option 4
Cassandra CF
Cassandra SuperCF
Use Case 1
Ecommerce Site

 Problem : Record User Preferences e.g :
  Location,IP,Currency selected, Source of Traffic,
  Multiple other dynamic values

 Solution : In a CF based structure keep it simple

UserId_Key:
 Pref2_Name:Value1,Pref2_Name:Value2,
 ….PrefN_Name:ValueN
Use Case 1
RowKey: 1350136093705_6501082438199894
=> (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000)
=> (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001)
=> (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001)
=> (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001)
    (column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001)


⇒    RowKey: 1354499614310_10861558002828044
⇒    => (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000)
⇒    -------------------
⇒    RowKey: 1349760150553_6114662943774777
⇒    => (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001)
⇒    -------------------
⇒    RowKey: 1349805109805_6167423558533191
⇒    => (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001)
⇒    -------------------
⇒    RowKey: 1354435656227_7908056941568359
⇒    => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000)
⇒    -------------------
⇒    RowKey: 1347648097261_15570089270962881
⇒    => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
Use Case 1
                                                                 Get


private Map<String, String> getPrerences(Keyspace keySpace, String userId, String...
   prefernceNames) throws IOException, CharacterCodingException {
SliceQuery<String, String, String> rsq = HFactory.createSliceQuery(keySpace,
    StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
rsq.setColumnFamily(USER_PREFERENCE);
rsq.setKey(userId);
rsq.setColumnNames(prefernceNames);


QueryResult<ColumnSlice<String, String>> orows = rsq.execute();
Map<String, String> preferenceMap = new LinkedHashMap<String, String>();
for (HColumn<String, String> column : orows.get().getColumns()) {
preferenceMap.put(column.getName(), column.getValue());
}
return preferenceMap;


}
Use Case 1
                                            Save


Mutator<String> m = HFactory.createMutator(keySpace, StringSerializer.get());


HColumn<String, String> userPrefrences = HFactory.createColumn(colkey, colvalue,
  StringSerializer.get(), StringSerializer.get());


userPrefrences.setTtl(ttlUserPrefrences);


m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences);


m.execute();
Use Case 2
Online Travel Site

Problem:      Need to know different metrics for a
           city hotels e.g.:

             Hotels booked in last X Time
             Hotels Last viewed in Y Time
             Hotels Left with Z Inventory
Use Case 2
RowKey: 2d323436353731
=> (super_column=911167901297486,
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago.,
     timestamp=1354962852610000)
    column=6c6173747669657765646d657373616762, value=Inventory#20 ,
    timestamp=1354962852610000,
     column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000
)
-------------------
RowKey: 58524f
=> (super_column=200903041759196196,
    (column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago.,
     timestamp=1347781187842000)
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago.,
     timestamp=1347707080147000))
=> (super_column=200903041848352230,
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago.,
     timestamp=1347266107708000))
Use Case 2
SuperSliceQuery<String, String, String, String> superQuery = HFactory.createSuperSliceQuery(getKeySpace(),
StringSerializer.get(), StringSerializer.get(),
StringSerializer.get(), StringSerializer.get());
superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode);


QueryResult<SuperSlice<String, String, String>> result = superQuery.execute();
List<HSuperColumn<String, String, String>> superColumns = result.get().getSuperColumns();


if (superColumns != null) {
for (HSuperColumn<String, String, String> superColumn : superColumns) {
Map<String, String> messages = new HashMap<String, String>();
List<HColumn<String, String>> columns = superColumn.getColumns();
if (columns != null) {
for (HColumn<String, String> column : columns) {
messages.put(column.getName(), column.getValue());
}
}
/* The equivalent doc *
document.addField(superColumn.getName(), messages);
documents.add(document);
}
}
Pig Script : MR
<document>

  <pigscript start="-16" end="-43200" start1="-1441" end1="-10080" start2="0" end2="-15" start3="0" end3="-1440">

     <comment>Delete All Messages</comment>

      <query><![CDATA[rows0 = LOAD 'cassandra://LH/HotelMessage' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );]]></query>

       <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query>

       <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query>

      <query><![CDATA[userhotel0 = FOREACH cols0 GENERATE key as key,com.mmt.solr.hotels.cassandra.ByteBufferToString($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

      <query><![CDATA[uriCounts0 = FOREACH userhotel0 GENERATE key as citycode,com.mmt.solr.hotels.cassandra.ToBag(TOTUPLE(name,null));]]></query>




       <comment>Last Viewed start 15 minutes to 30 days ago</comment>

      <query><![CDATA[rows = LOAD 'cassandra://LH/LastViewedHotels?slice_start=#start&slice_end=#end&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long,
        value:chararray) } );]]></query>

      <query><![CDATA[cols = FOREACH rows GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query>

  <query><![CDATA[userhotel = FOREACH cols GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

  <query><![CDATA[userhotelByCity = FOREACH userhotel GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query>

  <query><![CDATA[groupByhotels = GROUP userhotelByCity BY hotelid;]]></query>

  <query><![CDATA[uriCounts = FOREACH groupByhotels { D = LIMIT userhotelByCity 1;

                                   GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

                                   TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.')));

                                  };]]></query>



      <comment>Last Booked 1 to 8 days ago</comment>

      <query><![CDATA[rows1 = LOAD 'cassandra://LH/BookedHotels?slice_start=#startA&slice_end=#endA&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long,
         value:chararray) } );]]></query>

 <query><![CDATA[cols1 = FOREACH rows1 GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query>

 <query><![CDATA[userhotel1 = FOREACH cols1 GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

 <query><![CDATA[userhotelByCity1 = FOREACH userhotel1 GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query>

 <query><![CDATA[groupByhotels1 = GROUP userhotelByCity1 BY hotelid;]]></query>

 <query><![CDATA[uriCounts1 = FOREACH groupByhotels1 { D = LIMIT userhotelByCity1 1;



GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.')));

};]]></query>
Criteria's to Evaluate NoSQL Solutions

Internal partitioning

Automated flexible data distribution

Hot swappable nodes

Replication-style

Automated failover strategy

More Related Content

What's hot

NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 

What's hot (20)

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
NOSQL
NOSQLNOSQL
NOSQL
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Beyond Relational Databases
Beyond Relational DatabasesBeyond Relational Databases
Beyond Relational Databases
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 

Similar to Indic threads pune12-nosql now and path ahead

Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
benoitg
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
Guillaume Lefranc
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 

Similar to Indic threads pune12-nosql now and path ahead (20)

Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 

More from IndicThreads

Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
 Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
IndicThreads
 
Unraveling OpenStack Clouds
 Unraveling OpenStack Clouds Unraveling OpenStack Clouds
Unraveling OpenStack Clouds
IndicThreads
 

More from IndicThreads (20)

Http2 is here! And why the web needs it
Http2 is here! And why the web needs itHttp2 is here! And why the web needs it
Http2 is here! And why the web needs it
 
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
Understanding Bitcoin (Blockchain) and its Potential for Disruptive ApplicationsUnderstanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
 
Go Programming Language - Learning The Go Lang way
Go Programming Language - Learning The Go Lang wayGo Programming Language - Learning The Go Lang way
Go Programming Language - Learning The Go Lang way
 
Building Resilient Microservices
Building Resilient Microservices Building Resilient Microservices
Building Resilient Microservices
 
App using golang indicthreads
App using golang  indicthreadsApp using golang  indicthreads
App using golang indicthreads
 
Building on quicksand microservices indicthreads
Building on quicksand microservices  indicthreadsBuilding on quicksand microservices  indicthreads
Building on quicksand microservices indicthreads
 
How to Think in RxJava Before Reacting
How to Think in RxJava Before ReactingHow to Think in RxJava Before Reacting
How to Think in RxJava Before Reacting
 
Iot secure connected devices indicthreads
Iot secure connected devices indicthreadsIot secure connected devices indicthreads
Iot secure connected devices indicthreads
 
Real world IoT for enterprises
Real world IoT for enterprisesReal world IoT for enterprises
Real world IoT for enterprises
 
IoT testing and quality assurance indicthreads
IoT testing and quality assurance indicthreadsIoT testing and quality assurance indicthreads
IoT testing and quality assurance indicthreads
 
Functional Programming Past Present Future
Functional Programming Past Present FutureFunctional Programming Past Present Future
Functional Programming Past Present Future
 
Harnessing the Power of Java 8 Streams
Harnessing the Power of Java 8 Streams Harnessing the Power of Java 8 Streams
Harnessing the Power of Java 8 Streams
 
Building & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fameBuilding & scaling a live streaming mobile platform - Gr8 road to fame
Building & scaling a live streaming mobile platform - Gr8 road to fame
 
Internet of things architecture perspective - IndicThreads Conference
Internet of things architecture perspective - IndicThreads ConferenceInternet of things architecture perspective - IndicThreads Conference
Internet of things architecture perspective - IndicThreads Conference
 
Cars and Computers: Building a Java Carputer
 Cars and Computers: Building a Java Carputer Cars and Computers: Building a Java Carputer
Cars and Computers: Building a Java Carputer
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
 Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
 
Speed up your build pipeline for faster feedback
Speed up your build pipeline for faster feedbackSpeed up your build pipeline for faster feedback
Speed up your build pipeline for faster feedback
 
Unraveling OpenStack Clouds
 Unraveling OpenStack Clouds Unraveling OpenStack Clouds
Unraveling OpenStack Clouds
 
Digital Transformation of the Enterprise. What IT leaders need to know!
Digital Transformation of the Enterprise. What IT  leaders need to know!Digital Transformation of the Enterprise. What IT  leaders need to know!
Digital Transformation of the Enterprise. What IT leaders need to know!
 

Recently uploaded

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Indic threads pune12-nosql now and path ahead

  • 1. NoSQL: Now and Path Ahead Shubham Kumar Srivastava MakeMyTrip
  • 3. Abstract What and Why : NoSql Fundamentals Use Case Challenges Path Ahead 3 .
  • 4. What is NoSql Database which does not adhere to the traditional relational database management system (RDMS) structure .
  • 5. Why NoSql  Scalability and Performance  Cost  Data Modeling
  • 6. Why NoSql : Motives and Drivers Scalability and Performance  Horizontal scalability better than Vertical  Hardware getting cheaper and processing power increasing  Less Operational complexity as against RDBMS solutions.  In most of the solutions you get automatic sharding etc as default .
  • 7. Why NoSql : Motives and Drivers contd..
  • 8. Why NoSql : Motives and Drivers contd..
  • 9. Why NoSql : Motives and Drivers contd.. Cost  Scale(as with NoSql) with Hefty Cost  Commodity hardware, software versions, upgrades, maintenance.  This brought organizations look out for alternatives and the need for a cost effective scale out option.
  • 10. Why NoSql : Motives and Drivers contd.. Data Modeling SQL has been for  Concurreny,Consistency,Integrity  For Summations,Aggregations,Grouping’s  Schema Says: What all Do I answer ??
  • 11. Why NoSql : Motives and Drivers contd.. Data Modeling  A plain key-value store is very powerful and fit the max use cases for a NoSQL solution  Hierarchical or graph-like data modelling and processing.  Values like maps of maps of maps.  Document Databases which even store arbitrary complex objects.  Document based indexing data store’s are a huge success.
  • 12. Why NoSql : Motives and Drivers contd.. At times SW apps are not limited to these constraints . This lead to data models like Key/Value Store : Redis,MemcacheDb/Voldemort etc. Wide Column Store / Column Families : Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc. Document Based Store’s : Solr/Lucene/MongoDb/CouchDb/TerraStore etc. Graph Data Store : Neo4J/GraphBase/FlockDb etc.
  • 13. Why NoSql : Motives and Drivers contd..
  • 14. Why NoSql : Motives and Drivers contd..  Schema Says: What are the questions  Data modeling is based on the set of Queries  Exploit De-normalization Duplication  Use Aggregates  Manage Joins with App + Aggregation + DeNormalization etc.
  • 15. Some Fanda-mentals CAP Theorem At the most only two properties of the three in a shared/distributed system can be satisfied.  Consistency  Availability  Tolerance to Network Partitions
  • 17. Explanation Use case: Scaling Web Apps Critical fact’s : • Network outages are common • Customer shopping carts, email search, social network queries—can tolerate stale data How: Compromise on Consistency in-order to remain available vs disrupt user service at outages.
  • 18. Explanation  Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state.  Brewer’s CAP theorem says you have no choice if you want to scale up.
  • 19. Explanation contd.. Sharp Contrast : High Speed Financial Application  Highly Transactional  Consistent  Automated  Can’t live with Eventual consistency
  • 20. ACID vs BASE ACID  Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.  Consistent: A transaction cannot leave the database in an inconsistent state.  Isolated: Transactions cannot interfere with each other.  Durable: Completed transactions persist, even when servers restart etc.
  • 21. Some Fanda-mentals cont.. BASE Basic Availability Soft-state Eventual consistency
  • 22. Consistent Hashing Common way to load balance . The machine chosen to cache object o will be: hash(o) mod n n:total number of machines
  • 23. Consistent Hashing contd..  Adding a machine to the cache means hash(o) mod (n + 1)  Removing a machine to the cache means hash(o) mod (n - 1)  Result on any above: Disaster  Swamped machines with redistribution
  • 24. Consistent Hashing contd.. Commonly, a hash function(e.g MD5 hash) will map a value into a 128-bit key, 0~2^127-1(or 32 bit even as given next) .
  • 26. Consistent Hashing contd.. Both Key and Machine hashed with the same function
  • 27. Consistent Hashing contd.. Adding a Node
  • 28. Consistent Hashing contd.. Removing a Node
  • 29. Use Case and NoSQL Solution Problem: Need to store bookings per day of all hotels . Queries centered around city and regions. Hotel count : 1 Million Date Range : Now to next 365 *2 Days
  • 30. NoSQL: Path Ahead  ACID equivalence(Neo4J,CouchDb etc)  Transaction Support  Atomicity  MVCC
  • 31. NoSQL: Path Ahead contd.. Possible Solution Work with SQL Db w.r.t Creation/Updation etc. Archive the data in NoSQL for query/analysis etc.
  • 32. NoSQL: Path Ahead contd.. Enterprise Adoption and Challenges  NoSQL looks good for Unstructured data largely  SQL is the best choice for a broad range of traditional workloads.
  • 33. NoSQL: Path Ahead contd..
  • 34. NoSQL: Path Ahead contd.. Shout out loud Hybrid ACID + BASE They are not alternatives but supplements
  • 35. NoSQL: Path Ahead contd..  Maturity  Support  Skillset and Administration/Operation  Analytics and BI support
  • 36. NoSQL: Path Ahead contd..
  • 37. Q&A
  • 38. References  Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition- tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.  Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010  Brewers CAP theorem on distributed systems", royans.net  CAP Twelve Years Later: How the "Rules" Have Changed on-line resource  E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10; on-line resource  D. Abadi, "Problems with CAP, and Yahoo’s Little Known NoSQL System," DBMS Musings, blog, 23 Apr. 2010; on-line resource.  C. Hale, "You Can’t Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource.  Facebook: Scaling Out on-line resource.  Gemstone : The Hardest Problems In Data Management on-line resource  The Log-Structured Merge-Tree (Research Paper)  CodeProject : Consistent Hashing on-line resource  HighlyScalable : NoSQL Data Modeling Techniques on-line resource  eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource  John D Cook : Acid Vs Base on-line resource  Merkle Trees  Phy-Accural Faliover Detaection (Research Paper)
  • 39.
  • 40. Backup Slides Better than the Original 1 
  • 41. Document Based DataStore { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2011-09-19T02:10:11.3Z", author : "alex", title : "No Free Lunch", text : "This is the text of the post. It could be very long.", tags : [ "business", "ramblings" ], votes : 5, voters : [ "jane", "joe", "spencer", "phyllis", "li" ], comments : [ { who : "jane", when : Date("2011-09-19T04:00:10.112Z"), comment : "I agree." }, { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"), comment : "You must be joking. etc etc ..." } ] }
  • 43. User and Items : Option 1
  • 44. User and Items : Option 2
  • 45. User and Items : Option 3
  • 46. User and Items : Option 4
  • 49. Use Case 1 Ecommerce Site Problem : Record User Preferences e.g : Location,IP,Currency selected, Source of Traffic, Multiple other dynamic values Solution : In a CF based structure keep it simple UserId_Key: Pref2_Name:Value1,Pref2_Name:Value2, ….PrefN_Name:ValueN
  • 50. Use Case 1 RowKey: 1350136093705_6501082438199894 => (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000) => (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001) => (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001) => (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001) (column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001) ⇒ RowKey: 1354499614310_10861558002828044 ⇒ => (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000) ⇒ ------------------- ⇒ RowKey: 1349760150553_6114662943774777 ⇒ => (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001) ⇒ ------------------- ⇒ RowKey: 1349805109805_6167423558533191 ⇒ => (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001) ⇒ ------------------- ⇒ RowKey: 1354435656227_7908056941568359 ⇒ => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000) ⇒ ------------------- ⇒ RowKey: 1347648097261_15570089270962881 ⇒ => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
  • 51. Use Case 1 Get private Map<String, String> getPrerences(Keyspace keySpace, String userId, String... prefernceNames) throws IOException, CharacterCodingException { SliceQuery<String, String, String> rsq = HFactory.createSliceQuery(keySpace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); rsq.setColumnFamily(USER_PREFERENCE); rsq.setKey(userId); rsq.setColumnNames(prefernceNames); QueryResult<ColumnSlice<String, String>> orows = rsq.execute(); Map<String, String> preferenceMap = new LinkedHashMap<String, String>(); for (HColumn<String, String> column : orows.get().getColumns()) { preferenceMap.put(column.getName(), column.getValue()); } return preferenceMap; }
  • 52. Use Case 1 Save Mutator<String> m = HFactory.createMutator(keySpace, StringSerializer.get()); HColumn<String, String> userPrefrences = HFactory.createColumn(colkey, colvalue, StringSerializer.get(), StringSerializer.get()); userPrefrences.setTtl(ttlUserPrefrences); m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences); m.execute();
  • 53. Use Case 2 Online Travel Site Problem: Need to know different metrics for a city hotels e.g.: Hotels booked in last X Time Hotels Last viewed in Y Time Hotels Left with Z Inventory
  • 54. Use Case 2 RowKey: 2d323436353731 => (super_column=911167901297486, (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago., timestamp=1354962852610000) column=6c6173747669657765646d657373616762, value=Inventory#20 , timestamp=1354962852610000, column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000 ) ------------------- RowKey: 58524f => (super_column=200903041759196196, (column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago., timestamp=1347781187842000) (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago., timestamp=1347707080147000)) => (super_column=200903041848352230, (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago., timestamp=1347266107708000))
  • 55. Use Case 2 SuperSliceQuery<String, String, String, String> superQuery = HFactory.createSuperSliceQuery(getKeySpace(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode); QueryResult<SuperSlice<String, String, String>> result = superQuery.execute(); List<HSuperColumn<String, String, String>> superColumns = result.get().getSuperColumns(); if (superColumns != null) { for (HSuperColumn<String, String, String> superColumn : superColumns) { Map<String, String> messages = new HashMap<String, String>(); List<HColumn<String, String>> columns = superColumn.getColumns(); if (columns != null) { for (HColumn<String, String> column : columns) { messages.put(column.getName(), column.getValue()); } } /* The equivalent doc * document.addField(superColumn.getName(), messages); documents.add(document); } }
  • 56. Pig Script : MR <document> <pigscript start="-16" end="-43200" start1="-1441" end1="-10080" start2="0" end2="-15" start3="0" end3="-1440"> <comment>Delete All Messages</comment> <query><![CDATA[rows0 = LOAD 'cassandra://LH/HotelMessage' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[userhotel0 = FOREACH cols0 GENERATE key as key,com.mmt.solr.hotels.cassandra.ByteBufferToString($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[uriCounts0 = FOREACH userhotel0 GENERATE key as citycode,com.mmt.solr.hotels.cassandra.ToBag(TOTUPLE(name,null));]]></query> <comment>Last Viewed start 15 minutes to 30 days ago</comment> <query><![CDATA[rows = LOAD 'cassandra://LH/LastViewedHotels?slice_start=#start&slice_end=#end&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols = FOREACH rows GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel = FOREACH cols GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity = FOREACH userhotel GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels = GROUP userhotelByCity BY hotelid;]]></query> <query><![CDATA[uriCounts = FOREACH groupByhotels { D = LIMIT userhotelByCity 1; GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag( TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.'))); };]]></query> <comment>Last Booked 1 to 8 days ago</comment> <query><![CDATA[rows1 = LOAD 'cassandra://LH/BookedHotels?slice_start=#startA&slice_end=#endA&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols1 = FOREACH rows1 GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel1 = FOREACH cols1 GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity1 = FOREACH userhotel1 GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels1 = GROUP userhotelByCity1 BY hotelid;]]></query> <query><![CDATA[uriCounts1 = FOREACH groupByhotels1 { D = LIMIT userhotelByCity1 1; GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag( TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.'))); };]]></query>
  • 57. Criteria's to Evaluate NoSQL Solutions Internal partitioning Automated flexible data distribution Hot swappable nodes Replication-style Automated failover strategy