FQL Cache Invalidation
Content Integration Engineering
Data Technology
Alfred Rotondaro
Alfred Rotondaro
Page 2
FQL Cache Invalidation
Table of Contents
1 Overview and Purpose.................................................................................................................3
1.1 Audience.............................................................................................................................3
2 Problem Description....................................................................................................................3
3 Timestamp Records.....................................................................................................................3
3.1 Timestamp Record Job.........................................................................................................3
3.1.1 JAMS Processing...........................................................................................................4
3.1.2 Database Updates ........................................................................................................4
3.1.2.1 FDB Databases..........................................................................................................4
3.1.2.2 Non-FDB Databases ..................................................................................................4
4 Namespace Translations..............................................................................................................4
4.1 Database-to-Namespace Mapping........................................................................................4
4.2 Formula-to-Namespace Mapping..........................................................................................5
5 FQL Engine..................................................................................................................................6
5.1 Timestamp Processing..........................................................................................................6
5.1.1 Timestamp Implementations.........................................................................................7
5.1.1.1 Timestampincluded in Key........................................................................................7
5.1.1.2 Timestampincluded in Data Value.............................................................................7
5.2 Client Priming......................................................................................................................7
5.2.1 Client Priming Caveats ..................................................................................................7
5.3 Broadcasting Database Updates ...........................................................................................7
Appendix I – Formula Cache Invalidation Architecture ..........................................................................8
Appendix II – Timestamp Key and Record Format.................................................................................9
Alfred Rotondaro
Page 3
1 Overview and Purpose
Thisdocumentdescribesthe use of timestamprecordstodetermine whendatarecordsin Memcached
are invalid –i.e. they are no longerup-to-date. Thisdocumentfurtherexplainshow we canuse this
mechanismtocreate cachesthat are dynamicallypopulated byclients.
1.1 Audience
ThisdocumentiswrittenforSoftware Engineers whoworkwith applicationsthatuse FQLto fetchdata.
2 Problem Description
The purpose of caching FQL formulasisto improve applicationperformance byreducing calculation and
I/Otime. However, itisalsoimportanttoensure the timelinessof cacheddata.
3 Timestamp Records
Timestamprecordsprovide amechanismformonitoringthe freshness of datainMemcached. The
essential fieldsof atimestamprecord are symbol andtime. AppendixIIprovidesacomplete listingof
the fieldsincludedinatimestamprecord.
The timestamprecordlistsall the identifiersalongwiththe mostrecenttime anIDwas updatedinany
of the databasesina givennamespace. Figure 1showsa sample timestamprecord.
Key Value
$$TIMESTAMP _FF FDS 10:00
MSFT 10:00
IBM 10:00
GOOG 10:10
MMM 10:05
$$TIMESTAMP_MSCI MMM 10:15
T 10:00
3.1 TimestampRecordJob
A special jobcalledthe TimestampRecordJobisusedtoinserttimestamprecordsintoMemcached.
Figure 2 showsthe TimestampRecordJob processingdatabase updates. Thisjobcanbe scheduledto
run throughJAMS or can be launched automatically bydatabase updates.
Figure 1: Example of a Timestamp Record
Alfred Rotondaro
Page 4
Non-
FDB
Timestamp
Record JobUpdate
JAMS
Symbol-TIme
Roll Forward
FDB New
Timestamp
Old
Timestamp
Memcached
$$TIMESTAMP_FF –
FDS_10:00|IBM_10:00|MMM_10:05
Figure 2: Database Update Processing
3.1.1 JAMS Processing
The TimestampRecordJobis scheduledtorunevery15 minutesby JAMS(System:FRMLA_CACHE) to
ensure thatthe timestamprecordsexist. Whenthe jobisstarted,itretrievesthe timestamprecords
fromMemcached. If a record doesnotexist,thenitwill be regeneratedfromapersistentcopy orfrom
scratch. Additionally,the jobchecksthe database fileidstoensure thatthese fileshave notbeen
changed– i.e.copy-renamed.
3.1.2 DatabaseUpdates
The TimestampRecordJobcan be launchedthroughupdatesfromFDBand non-FDBdatabases. For
FDB databases,a roll forwardtriggermechanismisrequired,while non-FDBdatabasesrequire ascriptto
be calledbydatabase engineersforcopy-rename updates.
3.1.2.1 FDB Databases
Whenthe TimestampRecordJobis launchedbyFDBdatabase updates,the jobqueriesthe database for
symbol-update information usingthe existinghashtables. The symbol isnormalizedasa SEDOL.
3.1.2.2 Non-FDB Databases
For non-FDBdatabases,the database engineermustprovide aninputfile specifyingthe symbol-time
information.
4 Namespace Translations
The formulacache invalidationsolutionis predicatedontwotypesof mappings: database-to-namespace
and formula-to-namespace.
4.1 Database-to-NamespaceMapping
Duringthe processingof a database update,a database-to-namespace mappingisused todetermine
the namespace thatthe database belongsto. Basedon that information,apropertimestampkeyis
generated.
Alfred Rotondaro
Page 5
The actual database-to-namespace mappingis storedina configurationfile. Afterthe mappingis
loaded,the namespace isappendedto the timestamp key,which isthenusedto create the timestamp
record. Figure 3 shows the TimestampRecordJobusingthe database-to-namespace mappingtoupdate
a timestamprecord.
Key Value
$$TIMESTAMP _FF FDS 10:00
MSFT 10:00
IBM 10:00
GOOG 10:10
MMM 10:05
$$TIMESTAMP_MSCI MMM 10:15
T 10:00
Namespace Database
FF FF_ANNUAL
FF_FIELD
FF_MONTH
MSCI MSCI_ACE
MSCI_CHINA_A_CON
Database-to-Namespace
Mapping
Timestamp Update
Timestamp
Record Job
4.2 Formula-to-NamespaceMapping
While processingclientrequests,the formula-to-namespace mappingandthe timestamp are used. The
componentsandstepsinvolvedinprocessingaclientrequestare shownin Figure 4.
The formula-to-namespace mappingis storedinthe file fql_cacheable_formula.txt. Basedon the
mapping,the appropriate namespaceisappendedto the timestamp key,whichis thenusedtoretrieve
a timestamprecord containingsymbolsandtheircorrespondingmostrecenttime of update. Thistime
isthenappendedtothe cache key that isusedto fetchdata fromMemcached. In case of a cache miss,
the same cache keyisusedto insertdata intoMemcached.
Figure 3: Timestamp Record Processing
Alfred Rotondaro
Page 6
Figure 4: Client Request Processing
4) In case of a cache miss, data is inserted into Memcached, using the cache key.
Memcached
FQL Engine
Client
Cache
Miss
Formula
DB
Data
1
Timestamp
3 2
$$TIMESTAMP_FF –
FDS_10:00|IBM_10:00|MMM_10:05
4
1) Formula to Namespace Mapping.
Symbol Formula Namespace
IBM FF_Sales FF
Key Value
$$FQL_CACHING_TIMESTAMP_FF IBM 10:00
FF_Sales( )_IBM_10:00
2) Timestamp retrieved from Memcached.
3) Timestamp appended to cache key to fetch data from Memcached.
5 FQL Engine
The FQL Engine retrievesandinterpretscachingtimestampsandthenprimesMemcachedwithupdated
data. AppendixI showsthe FQL Engine inrelationtothe overall designof the formula cache
invalidationarchitecture.
5.1 TimestampProcessing
At the start of a download/reportsession,the FQLEngine usesasingle fetchtoretrieve from
Memcachedthe timestamprecordsfora namespace. These records,whichare storedinprocesscache
Alfred Rotondaro
Page 7
for one minute,are copiedtoFQL interpreterobjects,where the timestamprecordsremainineffect
until the endof the download. The time inthese recordsisthenusedtoensure thatthe clientisgetting
the most recentdata.
5.1.1 Timestamp Implementations
There are twooptionsforimplementingthe trackingof timestamps: the firstoptionistomake the
timestamppartof the key,while the secondoptionistoembedthe timestampintothe datavalue.
5.1.1.1 Timestamp included in Key
Thisis the optionthatis beingimplemented. The advantage of thisoptionisthatit resultsinmore true
hits,while the disadvantage isthatitcreatesmore keys andthus requiresmore storage.
5.1.1.2 Timestamp included in Data Value
Thoughnot currentlybeingimplemented,the advantage of thisoptionisthatitrequireslessstorage,as
it justoverwritesexistingkeys. The disadvantagesof thisoptionare thatitresultsinfalse hitsandalso
requiresmore post-processingtoextractthe key.
5.2 ClientPriming
Clientsare allowedtoinsertintoMemcachedoncache misses. However,onlythe latestdataistobe
insertedintoMemcached. Therefore,itisnecessarytodetermine whetherthe datareturned fromthe
databasesreflectthe latestavailabledata,asa clientmighthave ahandle toa stale database.
5.2.1 Client PrimingCaveats
The followingcaveatsapplytoclientpriming:
 Stale handlestoa database file willresultinthe insertionof stale data.
 Clientaccessishandledona case-by-case basis.
 Relative datesare usedwithdate-manipulatingformulasbyappendingthe CalendarandZero
date to the cache key.
5.3 Broadcasting DatabaseUpdates
At the database level,fdb_database usesDLMto signal toreadersof the database whetheranupdate is
available. Thisinformation ispropagatedtothe FQLEngine todetermine whethertoinsertthe formula
resultintoMemcached. Whendata isinsertedintothe cache,an“is_updated”flagispropagatedup
fromthe database. If the “is_updated”flagisfalse,thendataisinsertedintothe cache usingthe
timestamppreviouslyretrievedfromMemcached.
Alfred Rotondaro
Page 8
Appendix I – FQL Cache Invalidation Architecture
Timestamp
Record JobUpdate
Non-
FDB
JAMS
Symbol-TIme
Roll Forward
New
Timestamp
Old
Timestamp
FDB
Memcached
$$TIMESTAMP_FF –
FDS_10:00|IBM_10:00|MMM_10:05
FQL Engine
Client
Cache
Miss
Formula
DB
Data
Timestamp
Alfred Rotondaro
Page 9
Appendix II – Timestamp Key and Record Format
The timestampkeyconsistsof the prefix $$FQL_CACHING_TIMESTAMP alongwithaspecificnamespace
appendedtoit: for example, $$FQL_CACHING_TIMESTAMP_FF. The timestamprecordisa structure
withthe followingdataelements:
{
Int version;
Char fdsTableNumber[8];
U_int maxSymbolLength;
U_int numberOfSymbols;
Time_tearliestTime;
Time_tlatestTime;
Struct symbolMap
{
Char symbol [maxSymbolLength];
Time_ttime;
};
SymbolMapaSymbolMap [numberOfSymbols];
};
Note: The format of the timestamprecordissubjecttochange pendingperformance results.

Caching_Technology

  • 1.
    FQL Cache Invalidation ContentIntegration Engineering Data Technology Alfred Rotondaro
  • 2.
    Alfred Rotondaro Page 2 FQLCache Invalidation Table of Contents 1 Overview and Purpose.................................................................................................................3 1.1 Audience.............................................................................................................................3 2 Problem Description....................................................................................................................3 3 Timestamp Records.....................................................................................................................3 3.1 Timestamp Record Job.........................................................................................................3 3.1.1 JAMS Processing...........................................................................................................4 3.1.2 Database Updates ........................................................................................................4 3.1.2.1 FDB Databases..........................................................................................................4 3.1.2.2 Non-FDB Databases ..................................................................................................4 4 Namespace Translations..............................................................................................................4 4.1 Database-to-Namespace Mapping........................................................................................4 4.2 Formula-to-Namespace Mapping..........................................................................................5 5 FQL Engine..................................................................................................................................6 5.1 Timestamp Processing..........................................................................................................6 5.1.1 Timestamp Implementations.........................................................................................7 5.1.1.1 Timestampincluded in Key........................................................................................7 5.1.1.2 Timestampincluded in Data Value.............................................................................7 5.2 Client Priming......................................................................................................................7 5.2.1 Client Priming Caveats ..................................................................................................7 5.3 Broadcasting Database Updates ...........................................................................................7 Appendix I – Formula Cache Invalidation Architecture ..........................................................................8 Appendix II – Timestamp Key and Record Format.................................................................................9
  • 3.
    Alfred Rotondaro Page 3 1Overview and Purpose Thisdocumentdescribesthe use of timestamprecordstodetermine whendatarecordsin Memcached are invalid –i.e. they are no longerup-to-date. Thisdocumentfurtherexplainshow we canuse this mechanismtocreate cachesthat are dynamicallypopulated byclients. 1.1 Audience ThisdocumentiswrittenforSoftware Engineers whoworkwith applicationsthatuse FQLto fetchdata. 2 Problem Description The purpose of caching FQL formulasisto improve applicationperformance byreducing calculation and I/Otime. However, itisalsoimportanttoensure the timelinessof cacheddata. 3 Timestamp Records Timestamprecordsprovide amechanismformonitoringthe freshness of datainMemcached. The essential fieldsof atimestamprecord are symbol andtime. AppendixIIprovidesacomplete listingof the fieldsincludedinatimestamprecord. The timestamprecordlistsall the identifiersalongwiththe mostrecenttime anIDwas updatedinany of the databasesina givennamespace. Figure 1showsa sample timestamprecord. Key Value $$TIMESTAMP _FF FDS 10:00 MSFT 10:00 IBM 10:00 GOOG 10:10 MMM 10:05 $$TIMESTAMP_MSCI MMM 10:15 T 10:00 3.1 TimestampRecordJob A special jobcalledthe TimestampRecordJobisusedtoinserttimestamprecordsintoMemcached. Figure 2 showsthe TimestampRecordJob processingdatabase updates. Thisjobcanbe scheduledto run throughJAMS or can be launched automatically bydatabase updates. Figure 1: Example of a Timestamp Record
  • 4.
    Alfred Rotondaro Page 4 Non- FDB Timestamp RecordJobUpdate JAMS Symbol-TIme Roll Forward FDB New Timestamp Old Timestamp Memcached $$TIMESTAMP_FF – FDS_10:00|IBM_10:00|MMM_10:05 Figure 2: Database Update Processing 3.1.1 JAMS Processing The TimestampRecordJobis scheduledtorunevery15 minutesby JAMS(System:FRMLA_CACHE) to ensure thatthe timestamprecordsexist. Whenthe jobisstarted,itretrievesthe timestamprecords fromMemcached. If a record doesnotexist,thenitwill be regeneratedfromapersistentcopy orfrom scratch. Additionally,the jobchecksthe database fileidstoensure thatthese fileshave notbeen changed– i.e.copy-renamed. 3.1.2 DatabaseUpdates The TimestampRecordJobcan be launchedthroughupdatesfromFDBand non-FDBdatabases. For FDB databases,a roll forwardtriggermechanismisrequired,while non-FDBdatabasesrequire ascriptto be calledbydatabase engineersforcopy-rename updates. 3.1.2.1 FDB Databases Whenthe TimestampRecordJobis launchedbyFDBdatabase updates,the jobqueriesthe database for symbol-update information usingthe existinghashtables. The symbol isnormalizedasa SEDOL. 3.1.2.2 Non-FDB Databases For non-FDBdatabases,the database engineermustprovide aninputfile specifyingthe symbol-time information. 4 Namespace Translations The formulacache invalidationsolutionis predicatedontwotypesof mappings: database-to-namespace and formula-to-namespace. 4.1 Database-to-NamespaceMapping Duringthe processingof a database update,a database-to-namespace mappingisused todetermine the namespace thatthe database belongsto. Basedon that information,apropertimestampkeyis generated.
  • 5.
    Alfred Rotondaro Page 5 Theactual database-to-namespace mappingis storedina configurationfile. Afterthe mappingis loaded,the namespace isappendedto the timestamp key,which isthenusedto create the timestamp record. Figure 3 shows the TimestampRecordJobusingthe database-to-namespace mappingtoupdate a timestamprecord. Key Value $$TIMESTAMP _FF FDS 10:00 MSFT 10:00 IBM 10:00 GOOG 10:10 MMM 10:05 $$TIMESTAMP_MSCI MMM 10:15 T 10:00 Namespace Database FF FF_ANNUAL FF_FIELD FF_MONTH MSCI MSCI_ACE MSCI_CHINA_A_CON Database-to-Namespace Mapping Timestamp Update Timestamp Record Job 4.2 Formula-to-NamespaceMapping While processingclientrequests,the formula-to-namespace mappingandthe timestamp are used. The componentsandstepsinvolvedinprocessingaclientrequestare shownin Figure 4. The formula-to-namespace mappingis storedinthe file fql_cacheable_formula.txt. Basedon the mapping,the appropriate namespaceisappendedto the timestamp key,whichis thenusedtoretrieve a timestamprecord containingsymbolsandtheircorrespondingmostrecenttime of update. Thistime isthenappendedtothe cache key that isusedto fetchdata fromMemcached. In case of a cache miss, the same cache keyisusedto insertdata intoMemcached. Figure 3: Timestamp Record Processing
  • 6.
    Alfred Rotondaro Page 6 Figure4: Client Request Processing 4) In case of a cache miss, data is inserted into Memcached, using the cache key. Memcached FQL Engine Client Cache Miss Formula DB Data 1 Timestamp 3 2 $$TIMESTAMP_FF – FDS_10:00|IBM_10:00|MMM_10:05 4 1) Formula to Namespace Mapping. Symbol Formula Namespace IBM FF_Sales FF Key Value $$FQL_CACHING_TIMESTAMP_FF IBM 10:00 FF_Sales( )_IBM_10:00 2) Timestamp retrieved from Memcached. 3) Timestamp appended to cache key to fetch data from Memcached. 5 FQL Engine The FQL Engine retrievesandinterpretscachingtimestampsandthenprimesMemcachedwithupdated data. AppendixI showsthe FQL Engine inrelationtothe overall designof the formula cache invalidationarchitecture. 5.1 TimestampProcessing At the start of a download/reportsession,the FQLEngine usesasingle fetchtoretrieve from Memcachedthe timestamprecordsfora namespace. These records,whichare storedinprocesscache
  • 7.
    Alfred Rotondaro Page 7 forone minute,are copiedtoFQL interpreterobjects,where the timestamprecordsremainineffect until the endof the download. The time inthese recordsisthenusedtoensure thatthe clientisgetting the most recentdata. 5.1.1 Timestamp Implementations There are twooptionsforimplementingthe trackingof timestamps: the firstoptionistomake the timestamppartof the key,while the secondoptionistoembedthe timestampintothe datavalue. 5.1.1.1 Timestamp included in Key Thisis the optionthatis beingimplemented. The advantage of thisoptionisthatit resultsinmore true hits,while the disadvantage isthatitcreatesmore keys andthus requiresmore storage. 5.1.1.2 Timestamp included in Data Value Thoughnot currentlybeingimplemented,the advantage of thisoptionisthatitrequireslessstorage,as it justoverwritesexistingkeys. The disadvantagesof thisoptionare thatitresultsinfalse hitsandalso requiresmore post-processingtoextractthe key. 5.2 ClientPriming Clientsare allowedtoinsertintoMemcachedoncache misses. However,onlythe latestdataistobe insertedintoMemcached. Therefore,itisnecessarytodetermine whetherthe datareturned fromthe databasesreflectthe latestavailabledata,asa clientmighthave ahandle toa stale database. 5.2.1 Client PrimingCaveats The followingcaveatsapplytoclientpriming:  Stale handlestoa database file willresultinthe insertionof stale data.  Clientaccessishandledona case-by-case basis.  Relative datesare usedwithdate-manipulatingformulasbyappendingthe CalendarandZero date to the cache key. 5.3 Broadcasting DatabaseUpdates At the database level,fdb_database usesDLMto signal toreadersof the database whetheranupdate is available. Thisinformation ispropagatedtothe FQLEngine todetermine whethertoinsertthe formula resultintoMemcached. Whendata isinsertedintothe cache,an“is_updated”flagispropagatedup fromthe database. If the “is_updated”flagisfalse,thendataisinsertedintothe cache usingthe timestamppreviouslyretrievedfromMemcached.
  • 8.
    Alfred Rotondaro Page 8 AppendixI – FQL Cache Invalidation Architecture Timestamp Record JobUpdate Non- FDB JAMS Symbol-TIme Roll Forward New Timestamp Old Timestamp FDB Memcached $$TIMESTAMP_FF – FDS_10:00|IBM_10:00|MMM_10:05 FQL Engine Client Cache Miss Formula DB Data Timestamp
  • 9.
    Alfred Rotondaro Page 9 AppendixII – Timestamp Key and Record Format The timestampkeyconsistsof the prefix $$FQL_CACHING_TIMESTAMP alongwithaspecificnamespace appendedtoit: for example, $$FQL_CACHING_TIMESTAMP_FF. The timestamprecordisa structure withthe followingdataelements: { Int version; Char fdsTableNumber[8]; U_int maxSymbolLength; U_int numberOfSymbols; Time_tearliestTime; Time_tlatestTime; Struct symbolMap { Char symbol [maxSymbolLength]; Time_ttime; }; SymbolMapaSymbolMap [numberOfSymbols]; }; Note: The format of the timestamprecordissubjecttochange pendingperformance results.