• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
DataCyte - The Future of Data Storage & Retrieval

DataCyte - The Future of Data Storage & Retrieval



The vision for the creation of DataCyte was to create a data storage and retrieval structure which would enable the development of applications in an organic manner and where the performance of the ...

The vision for the creation of DataCyte was to create a data storage and retrieval structure which would enable the development of applications in an organic manner and where the performance of the applications would be largely independent of the amount of data and the relationships built between the data elements.



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.linkedin.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • hello every one...we worked with this company and they don't give our payment from 1.5 years so guys please don't work with company...be aware...they are conman...so please don't work with this company...guys
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    DataCyte - The Future of Data Storage & Retrieval DataCyte - The Future of Data Storage & Retrieval Presentation Transcript

    • Converting Data to Information29/04/2011 DataCyte (Pty) Ltd 1
    • DataCyte Group of Companies • Founded in 1998 • Previously known as World Wide Objects • Privately owned and funded • Development done in Pretoria, South Africa • Expanding to create distribution and partner network • Building relationships with ISVs29/04/2011 2
    • DataCyte Timeline  1998 - Product was conceptualized, developed first version by late 1999.  2000 - Lodge Patent Application  2001 - Rated 5-10 years before IBM grid computing initiative by DARPA/CSC/Lockheed Martin - Awarded United States of America Department of Defense contract  2002 - Defense contract suspended due to war on terror  2003 - Return to South Africa due to declaration of war against “terror” - Start delivering healthcare systems to South African market  2005 - Return to the US market with Healthcare and hi-tech value proposition  2006 - Benchmark data analysis capabilities with Zirmed, prove a 50% in size reduction and 10x faster - Entered into business relationship with Dr PatrickSoon-Shiong of Abraxis Biosciences and American Pharmaceutical Partners Inc.  2008 - A conflict of product direction emerged with Dr Patrick Soon-Shiong - resulted in termination of the relationship. All Intellectual Property rights reverted back to DataCyte.29/04/2011 3
    • DataCyte Timeline cont.  2008 - Cedars Sinai Cancer and Proteomic Research Unit (UCLA) benchmark - DXS Health Care Systems Technology Partnership (www.dxs-systems.com) - Trash Can Kids Technology Partnership (www.trashcankidz.com ) - Electronic Price Labeling Technology Partnership - Interactive Television/Phumelela Technology Partnership (www.phumelela.com)  2009 - Establish strategic partnership Health One Global (www.healthoneglobal.com ) - IR Global Partnership to deliver international roaming at dramatic discounted rates and enabling prepaid customer to also roam. - Barlow World Logistics Product Development - Re-engage with the United Sates Department of Defense through US presence - Granted US Patent #757144229/04/2011 4
    • 29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 5
    • DataCyte Timeline cont.  2010 April - Booz Allen Hamilton (www.boozallen.com) presents DataCyte as future data solution at American Association for the Advancement of Science. AAAS (www.aaas.org) is the largest paid circulation of any peer-reviewed general science journal in the world, founded in 1848, and is considered one of the global authorities in the direction of Science, Engineering and Innovation May - Launch Interactive Television, 400 units rolled out in TABS. Prime Media is currently finalizing purchase of advertising slots for 12 month period. June - Launch of DXS (dxssynergy.com) web based system to the USA market as part of its global rollout. A global vendor in the provisioning of healthcare related systems. June – Launch of Trash Can Kids (www.trashcankidz.com) June – Launch of Process Discovery Product with 2 customers going live this month. The system has already being adopted by a large defense manufacturer. June - E-Discovery product launched with EM (The largest non-life actuarial consultant firm in UK)29/04/2011 6
    • DataCyte Timeline cont.  2010 June - Negotiation started with Bytes Technology and its Med-e-mass (www.medemass.com) subsidiary to underpin their current suite of management system with a comprehensive EHR Solution for the South African market. July - Health One Global (www.healthoneglobal.com.au) launches Personal Electronic Health Record and Medical Management record in Australia. This launch coincides with the launch of the Australian Government Unique personal health identifier, with the support of the Australian Automobile Association and the Royal Academy of Physicians as a first step to provide the Australian a health record management service. The Australian government has legislated that all citizens must have these records in place by 2013.29/04/2011 7
    • Why DataCyte?29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 8
    • Computing Challenge • The “Global Village” has “Global Data” • Boundaries removed • Information flow is more pervasive • Physical Storage • Users store more data than ever before • Little new development in Data Retrieval Systems • Processing • More processing required to retrieve similar data • Little development in Computing Processing Systems • Present Business Tendencies • Swing back to centralized systems • Swing back to thin client29/04/2011 DataCyte (Pty) Ltd 9
    • DataCyte Patented Solution• Performance not dependent on number of records• No single point of vulnerability • No central registry • Information redundantly distributed • RAIS – Redundant Array of Inexpensive Servers• Dynamic, Intelligent Information • Contextual „named‟ links between data entities • Dynamic data structure • Pervasive Associations• Self-managing, Distributed Information Structures • Any Entity must have „independence of existence‟ • Entities „self-aware‟ of environment• Web-enabled with open interface - Apache29/04/2011 DataCyte (Pty) Ltd 10
    • PerformanceFeatures• Access by association• Fully distributed storage system• DataCyte data storage is 10% of the size of traditional systems• Sustainable data creation at 400 000 cytes per second on a standard desktop computer• Random data access speed of over 250 000 cytes per second on a standard desktop computer• Caches up to 25 000 000 cytes in 2Gb memory• Can access from 250 000 000 000 cytes in sub- millisecond• Runs on Linux and Windows29/04/2011 DataCyte (Pty) Ltd 11
    • DataCyte TechnologyCyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service29/04/2011 DataCyte (Pty) Ltd 12
    • Access Models• Multiple Logical Models: Data and application layer Network Structured Containment29/04/2011 DataCyte (Pty) Ltd 13
    • Case Studies29/04/2011 DataCyte (Pty) Ltd 14
    • Case Studies DataCyte Pilot/Test Process Not part of test – steady source Current State Process DataStage ETL MS SSIS MDX BAH DW Process Timing: PME Data MS SSRS MS SSAS Query Oracle 4.5 hrs daily Mart Reports Oracle – 6.5 hrs for closings 750GIG (2 times month) DataCyte Test DataCyte Extraction & Web BAH DW Translation Crystal DataCyte Service Oracle Reports Fact Maps29/04/2011 DataCyte (Pty) Ltd 15
    • Case Studies• BAH: Implementation of BI Reporting Oracle DataCyte • Database size 750Gb 51Gb (6,8%) • Retrieval speeds: Indexed Random Access 0,152 secs 0,008 secs (5,2%) Indexed Step Thru 0,963 secs 0,016 secs (1,66%) Unindexed Step Thru 2.515 secs 10.78 secs (425%) • Hardware Platform: ±US$2 000 000 US$ 1 000 SunTM Grid Rack 400, SAN Low-end DesktopVM 1 x Staging areas, 1 x Cube storage 2,33GH processor 200GB HDD • Software: ±US$3 000 000 Software: ± US$ 500 000 ELT Toolkit, DataCyte 2 x Oracle 11g MS HyperCube29/04/2011 DataCyte (Pty) Ltd 16
    • Other Case Studies• Proteomic Research Unit • Database Size 1,3Tb in Oracle 60Gb in DataCyte • Retrieval speeds: 1½ minutes < 1sec in DataCyte 1 - 2 days < 11-66 mins in DataCyte • Hardware Platform: SunTM Grid Rack of 400 Toshiba Laptop Sun FireTM x64 servers 1,86GH processor 7 200rpm drive• UCS SAP database • 860Gb in DB2 database 100Gb in DataCyte • Queries up to 1000 times faster29/04/2011 DataCyte (Pty) Ltd 17
    • Applications Developed • Knowledge Management Systems o e-Learning Systems o Interactive TV Management Systems o Medical Information Systems • Health Management – “Single Patient Record” • Practice Management • Clinic Management System • Pathology Laboratory Management • Clinical Trials System • Hospital Management System29/04/2011 DataCyte (Pty) Ltd 18
    • Applications Developed• Data Warehousing o ETL o “Data Cube” o Lawgistics o Fraud Detection• SME Payroll System• Process Management Server o Document Tracking Systems o Business Process Modeling o Supply Chain Management System• Computational Performance Systems o Biometrics o Proteomic and Genomic Analysis o Shortest Path Routing29/04/2011 DataCyte (Pty) Ltd 19
    • DataCyte Benefits • 90% reduction in hardware requirements • 10 to 1000 time speed improvement • Ability to populate archive/warehouse in real-time • Ability to access archived data faster than existing on- line live system • Extension of life of live systems • Greater security due to ALL history on-line29/04/2011 DataCyte (Pty) Ltd 20
    • Contact Details• DataCyte (Pty) Ltd • 489 Clarence Street Tel: +27 12 993 1256 Waterkloof Glen Fax: +27 12 993 2412 Pretoria• Michael F Salomon CEO • Cell: +27 82 552 5411• Peter Salemink COO • Cell: +27 83 677 2783• Daniel Opland Technology Evangelist • Cell: +27 83 312 594729/04/2011 DataCyte (Pty) Ltd 21
    • Customers • Booz Allen Hamilton Inc • South African Fraud Prevention Service • TrashCanKidz Limited • Broadband Interactive TV System • PayStaffOnline (Pty) Ltd • 360 Link-up Limited / EMC Limited29/04/2011 DataCyte (Pty) Ltd 22
    • Back-up Slides29/04/2011 DataCyte (Pty) Ltd 23
    • Technology OverviewDatabase Management System• Access • Object • SQL • CyteEtymology: “Cyte”• Ancient Greek word κύτος (kýtos) • Container or Receptacle • Human body → part of cell that keeps everything togetherDeveloped in C++• Runs on Windows and LinuxODBC, XML and Web Service accessApache module: mod_dsa• HTTP(S), FTP, WSDL, SOAP, …
    • Technology Overview• Store: any form of data → „Cytes‟ • Serialized and persisted on creation (more later) • Accessed by association in a contextual / stateful manner • Collectively form multiple intersecting hierarchies • Each Cyte has the potential to form part of a distributed cloud • Virtualize disparate data → single federated view • Contain application business logic • Lua (www.lua.org)• Lua • Powerful, fast, lightweight scripting language • Embedable • Lua is widely used: • Industrial Applications (Adobe: Photoshop Lightroom) • Games (Blizzard: World of Warcraft) • Embedded Systems (Ginga, Digital TV in Brazil) • Lua Server Pages • Tag-based Web applications that dynamically generate Web pages
    • Technology OverviewExecution Layers• Application Layer• Data / Engine Layer
    • Technology OverviewBasic Performance(1.6Ghz Dual Core, 3Gb RAM, 7 200 rpm drive)Sustained creation speed• 400 000 cytes per secondSequential access speed• 400 000 cytes per secondRandom access speed• 250 000 cytes per secondCache• 25 000 000 cytes in 2Gb memoryAccess• Any element from 250 billion elements in under a millisecond
    • Data StructureCyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service
    • Data Structure
    • Data StructureDemonstration: IDE
    • Data StructureLogical representations: data & application layer• Network representation• Structured representation• Containment representation Network Structured Containment
    • Data StructureComplexity vs Simplicity• Simpler → faster learning curve• Translation layer • RDBMS • Programmed • Maintained • Adding features, fixing bugs, improvements • Collectively comprise 80% of lifetime cost • DataCyte • No translation layer • Saving: Development (Time and Cost) • Integrated into database layer (a la EJB) • e.g. Cytes with application logic
    • Data StructureImpedance of Mismatch (Translation Layer)• Maintenance and Development (RDBMS) • Different mapping → mismatch and integrity violation • Subtle Issues • Difficult to locate (time + money) • Lower impedance of mismatch in DataCyte • No translation layer → natural modelling of dataArchitecture: Simple• Option: logically structure and constrain → RDBMS + ODBC• Multiple logical views of the same data • Facilitates conformance to multiple standards
    • Discovery Logical Model 1 2 3 4 External Conceptual Model Conceptual Physical Model 1 2 PhysicalCytes→ Logical representation of physical storage→ Navigational construct each navigation → physical disk read→ Brokered by DataCyte service
    • DiscoveryEach Cyte is addressed as: IP address + Disc + File + Position in file
    • DiscoveryCytes contain application logic • Variables → pointers • Access stored data • Execute application code Variable =
    • DiscoveryContextual Execution Demonstration: IDE …
    • Discovery• Cyte Discovery • Relative Paths • in reference to executing Cyte • getAge() example • Absolute paths • in reference to root Cyte ?
    • Query Approach
    • Query Approach• Types of Queries • Without indexes • each record is checked in turn • Indexed • filtered records• Query approach (same as RDBMS) Know what you are looking for AND Where you want to look for it• Query Steps: STEP 1: IDENTIFY STEP 2: POPULATE RESULT SET RESULT SET
    • Query Approach• At time of query RESULT SET IDENTIFICATION • Improved indexes RESULT SET POPULATION (Compound) • Traverse logical layer (minimal reads) • Context = Stateful Results vs • Additional external lookups Additional External Navigate through Logical Structure Lookups O(n) → O(1)
    • Query ApproachDefine: Contextual / Stateful Results
    • Query ApproachDefine: Contextual / Stateful Results
    • Query ApproachAddressing schema • Defines context of access • Cyte → Unique ID within local file system • Offset within file • Cytes simply exist within file system • No global registryMultiple Contexts = Multiple Addresses Address = Context = Chain of ID‟s (named)
    • Query Approach• Query Language • Show of hands: SQL users • Similar to XPath • Parent or child Cytes (multiple criteria) • SQL Interface • Cytes that conform to relational model• Lower complexity of architecture • More natural language • Steeper learning curve (learn more faster)
    • Performance & Scalability• Sub-linear Performance Degradation • Logical Layer → Directed Searches • Example: Geo-spatial modelling• Instantiation • Full control over level • No class hierarchy• Multiple Logical Structures • Same data, different context • Multi-dimensional searches → single dimension
    • Performance & Scalability• OLTP • Architecture marries Structured + Networking paradigms • Container Topology • Allows extensible heterogeneous data structuring • 3-stage versioning protocol • Balance: performance and integrity• Data Footprint • Encoding and Compressing on storage• No Intermediate link tables Intermediate Products Table Ingredients
    • Performance & Scalability• Proof of Concept: 2008• Cancer research hospital (Los Angeles)• Considerable funding• A proteomic analysis problem – blood analysis study • Data mining to search for cancer markers • 50 data samples • 250 billion data elements • 1.3 Tb in Oracle• Results are from the same data set and same queries Cancer Center DataCyte Single criteria queries 1½ minutes < 1 sec Complex Queries ± 1 – 2 days < 11-66 mins Hardware Sun™ Grid Rack 400 Laptop Sun Fire™ x64 servers Data footprint 1.3Tb 60Gb
    • Performance & Scalability• No Intermediate link tables Intermediate Products Table Ingredients
    • Query ApproachDefine: Contextual / Stateful Results
    • Query ApproachDefine: Contextual / Stateful Results
    • Data Storageo Serializedo Encoded Open Encode Decodeo Compressed Encodedo Pages Decompress Compresso Caching Stack Encoded & Compressedo Data Distance
    • Data Storageo Leaf nodeso Stack managemento Partial Decodingo Data Management
    • PerformanceFour Spheres of Influence during Design
    • Data Storage• Enterprise Cloud Storage• Soft RAIS using commodity hardware• RAIS provides soft parallelized, grid computing• Soft RAIS enables redundant distribution of cytes• Granular scalability and full sharability of resources• Elastic auto provision of service and resources• Unified access to data through multiple data models• New programming approaches unconfined by • old designs and • existing programming languages • to tackle the new data flood.• Green • Footprint • Power usage – running, cooling and start-up
    • Data Storage• External Data Sources • Lua add-on libraries: LuaCOM and ADOLua• Access: Data Services • MSSQL NCLI (Native Client Interface) • DB2 OLEDB (Object Linking and Embedding) • Oracle OLEDB (Object Linking and Embedding)
    • Security• Security implemented by the service on the cyte level• Domain-based, inclusion, exclusion• Cyte-to-cyte communication is encrypted• Redundant distribution of cytes provides additional security• Contextual access provides further flexibility for security • Child and parent presentation• Hardware encryption of storage is preferable• Cyte granularity enables • Blind security information retaining associations • Cleansed health data with relationships• DataCyte can integrate with existing authentication / authorization systems (LDAP, Active Directory)
    • Disaster Recovery• Transaction-based with roll-back• All transactions are Atomic, Consistent, Isolated and Durable (ACID)• 3-state versioning protocol • My old • My new • Yours• provides fine grain control• Balance between performance and integrity mitigation• Each service can partial recover from physical loss• Redundancy could provide complete recovery