Overview of the Doradus database open source project and the Cassandra database on which it is based. This presentation was given to the Orange County Big Data Meetup group on July 16, 2014.
Extending Cassandra with Doradus OLAP for High Performance Analyticsrandyguck
Slides from an O'Reilly Webinar given on July 29th, 2015. This presentation describes how the Doradus database framework and the OLAP storage service extend Cassandra to provide a unique database solution for certain big data applications. Doradus OLAP uses columnar storage, application-level sharding, compression, and other techniques to store data very densely, yielding fast loading and queries that can scan millions of objects per second.
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Extending Cassandra with Doradus OLAP for High Performance Analyticsrandyguck
Slides from an O'Reilly Webinar given on July 29th, 2015. This presentation describes how the Doradus database framework and the OLAP storage service extend Cassandra to provide a unique database solution for certain big data applications. Doradus OLAP uses columnar storage, application-level sharding, compression, and other techniques to store data very densely, yielding fast loading and queries that can scan millions of objects per second.
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner.
The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively.
Hunter Kelly @retnuh
tech.zalando.com
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sewz2m
This CloudxLab Key-Value RDD tutorial helps you to understand Key-Value RDD in detail. Below are the topics covered in this tutorial:
1) Spark Key-Value RDD
2) Creating Key-Value Pair RDDs
3) Transformations on Pair RDDs - reduceByKey(func)
4) Count Word Frequency in a File using Spark
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use Amazon DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRAllice Shandler
An Introduction to Apache Spark with Amazon EMR. Dr. Peter Smith's presentation slides from the Vancouver Amazon Web Services User Group Meetup on November 20, 2018 at ACL hosted and presented by Onica.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
Om nom nom nom
Talk given at Clojure/conj 2014 in Washington DC
Video available here: https://www.youtube.com/watch?v=4-oyZpLRQ20
Have you ever needed an easily customisable dashboard? Or needed to visualise data in a browser but was overwhelmed by d3.js? This talk will cover basics of React and Om, some data visualisation libraries and techniques, ways to handle live data and combining all that into an easily customisable dashboard. Expect demos, code and maybe, just maybe, om nom nom nom cookies.
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2spQIBA
This CloudxLab Introduction to Apache Spark tutorial helps you to understand Spark in detail. Below are the topics covered in this tutorial:
1) Spark Architecture
2) Why Apache Spark?
3) Shortcoming of MapReduce
4) Downloading Apache Spark
5) Starting Spark With Scala Interactive Shell
6) Starting Spark With Python Interactive Shell
7) Getting started with spark-submit
A presentation I made for Apache Spark and Apache Cassandra Integration.
First I present what are some of the differences between RDBMS and NoSQL, then I proceed with the Cassandra infrastructure and usual errors when creating a Cassandra Data Model.
Finally, I provide the Spark underlying main concepts and some settings for proper configuration.
Have you heard about all the hot new features in SQL Server 2017? One of the game-changing features is Graph DB. Learn what it is, how you can use it, and what scenarios it excels in - specifically where data has strongly defined relationships and is more interconnected.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use Amazon DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRAllice Shandler
An Introduction to Apache Spark with Amazon EMR. Dr. Peter Smith's presentation slides from the Vancouver Amazon Web Services User Group Meetup on November 20, 2018 at ACL hosted and presented by Onica.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
Om nom nom nom
Talk given at Clojure/conj 2014 in Washington DC
Video available here: https://www.youtube.com/watch?v=4-oyZpLRQ20
Have you ever needed an easily customisable dashboard? Or needed to visualise data in a browser but was overwhelmed by d3.js? This talk will cover basics of React and Om, some data visualisation libraries and techniques, ways to handle live data and combining all that into an easily customisable dashboard. Expect demos, code and maybe, just maybe, om nom nom nom cookies.
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2spQIBA
This CloudxLab Introduction to Apache Spark tutorial helps you to understand Spark in detail. Below are the topics covered in this tutorial:
1) Spark Architecture
2) Why Apache Spark?
3) Shortcoming of MapReduce
4) Downloading Apache Spark
5) Starting Spark With Scala Interactive Shell
6) Starting Spark With Python Interactive Shell
7) Getting started with spark-submit
A presentation I made for Apache Spark and Apache Cassandra Integration.
First I present what are some of the differences between RDBMS and NoSQL, then I proceed with the Cassandra infrastructure and usual errors when creating a Cassandra Data Model.
Finally, I provide the Spark underlying main concepts and some settings for proper configuration.
Have you heard about all the hot new features in SQL Server 2017? One of the game-changing features is Graph DB. Learn what it is, how you can use it, and what scenarios it excels in - specifically where data has strongly defined relationships and is more interconnected.
C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structu...DataStax Academy
Speaker: Eric Zoerner, Senior Software Developer at eBuddy
Video: http://www.youtube.com/watch?v=fwgCJ2MzakA&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=12
In this session you'll learn about the design and implementation of a new open source general-purpose Java library that supports storing structured data in Cassandra. Instead of mapping the data to multiple tables like an ORM would or embedding data using serialization, this approach decomposes structured data of arbitrary complexity into separate columns of simple values, allowing the data to be retrieved or updated in parts using hierarchical paths. Implementations are included for Cassandra using both the Thrift and CQL3 APIs. In addition, Eric's experiences are shared regarding the challenges of using CQL3 vs. Thrift for schema-less data.
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
An introduction to the different types of NoSQL and some guidance on when to choose them, and when to use plain old SQL. Focuses on developer productivity, intuitive code, and system issues including scaling and usage patterns. As delivered at JavaOne 2014 in San Francisco
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Paul Leclercq
Paris Spark Meetup - May 2017
Video : https://www.youtube.com/watch?v=w5Zd-1wIJrU
AdHoc analysis of radio stations broadcasts stored in a parquet files with plain SQL, the dataframe API.
The aim was to notice radio stations habits, differences and if radio stations brainwashing is a thing
This talk's Databricks notebook can be found here : https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6937750999095841/3645330882010081/6197123402747553/latest.html
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
Slides to the Hands On Spring Data lab, presented in Paris on Dec 10th, 2012. Code exercises are here: https://github.com/ericbottard/hands-on-spring-data
A Tale of Three Apache Spark APIs: RDDs, DataFrames and Datasets by Jules DamjiData Con LA
Abstract:- Of all the developers delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs - RDDs, DataFrames, and Datasets available in Apache Spark 2.x. In particular, I will emphasize why and when you should use each set as best practices, outline its performance and optimization benefits, and underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them.
Databases have been around for decades and were highly optimised for data aggregations during that time. Not only Big data has changed the landscape of databases massively in the past years - we nowadays can find many Open Source projects among the most popular dbs.
After this talk you will be enabled to decide if a database can make your work more efficient and which direction to look to.
Hybrid Databases - PHP UK Conference 22 February 2019Dave Stokes
The introduction of a JSON data type allows for relational databases that can also function as schemaless NoSQL JSON document stores. This also let you reduce expensive and nasty many-to-many table joins as well as providing data mutability in an environment known for having very ridgid structures
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Overiew of Cassandra and Doradus
1. Overview of Cassandra and
The Doradus OSS Project
Randy Guck
Principal Engineer, Dell Software
2. Overview
• What is No SQL?
– Common RDB roadblocks
– NoSQL database types
• Overview of Cassandra
– What's unique
– Limitations
• Doradus
– Architecture
– Features
– The OLAP and Spider storage managers
– What each is good for
– Where to get Doradus
3. Why RDB Apps Look for Something Else
• Performance
– B-trees
– Locking
– One writable copy of each record
• Scaling costs
– RDBs scale "up"
– Big boxes, SANs, fiber channel, etc.
• What if you want...
– Distributed access
– No single points of failure
– Instant failover
– Sharding
– Replication
4. NoSQL Data Models
Data Model Examples Elastic? Queries? Relationships?
Key–Value
LevelDB, Kyoto Cabinet,
Redis
No No No
Distributed Key–
Value
Dynamo, MemcacheDB,
Riak, Voldemort
Yes No No
Column-Oriented
Accumulo, Cassandra,
HBase
Yes Some No
Document-
Oriented
Couchbase,
Elasticsearch, MongoDB
Yes Yes Some
Graph Neo4J, OrientDB, Titan No Yes Yes
Sharding + replication
AND/OR/ranges/etc.
Built-in support
5. NoSQL Data Models
Data Model Examples Elastic? Queries? Relationships?
Key–Value
LevelDB, Kyoto Cabinet,
Redis
No No No
Distributed Key–
Value
Dynamo, MemcacheDB,
Riak, Voldemort
Yes No No
Column-Oriented
Accumulo, Cassandra,
HBase
Yes Some No
Document-
Oriented
Couchbase,
Elasticsearch, MongoDB
Yes Yes Some
Graph Neo4J, OrientDB, Titan No Yes Yes
Sharding + replication
AND/OR/ranges/etc.
Built-in support
Doradus goals
6. NoSQL Common Traits
• Distributed cluster of nodes
– Commodity, shared-nothing servers
– Scales horizontally
– Expands elastically
• Replication
– Performant local access
– Automatic failover
• De-normalized data model
• Schemaless/dynamic columns
• Eventual consistency
N=5, RF=3
8. Overview of Cassandra
• Wide column NoSQL database
• Open sourced by Facebook
• Apache Project with active community
• Commercially support by DataStax,
Acunu, others
• Used by 1,500+ companies
• "Pure peer" architecture
• Largest known Cassandra cluster:
300+ TB data and 400+ machines.
9. What is Cassandra best for?
• Continuous data streams
– Logs, events, audit records, measurements, ...
– Fast data ingestion
– Predictable read performance
• Partitionable data
– "1,000's of little databases in one"
• Elastic scalability
– Expand/upgrade/repair without downtime
• Not good for:
– Blob store
– Persistent queue
– OLTP transactions
10. CQL Static Table
CREATE
TABLE
songs
(
id
uuid
PRIMARY
KEY,
title
text,
album
text,
artist
text,
data
blob
);
CREATE
INDEX
ON
songs
(artist);
Row Key Columns: "<column
name>"="<column
value>"
62c36...
"album"="90125"
"artist"="Yes"
"data"=<audio>
"title"="Changes"
837a2...
"album"="Crystal
Ball"
"artist"="Styx"
"data"=<audio>
"title"="Put
Me
On"
2de83...
"album"="Nevermind"
"artist"="Nirvana"
"data"=<audio>
"title"="Breed"
...
11. CQL Clustered Table
CREATE
TABLE
playlists
(
id
uuid,
song_order
int,
song_id
uuid,
//
copied
from
songs.id
title
text,
//
copied
from
songs.title
album
text,
//
copied
from
songs.album
artist
text,
//
copied
from
songs.artist
PRIMARY
KEY
(id,
song_order)
//
compound
key
);
Row Key Columns: "<song_order>:<column
name>"="<column
value>"
28d23...
"1:"=""
"1:album"="90125"
"1:artist"="Yes"
"1:song_id"="62c36..."
"1:title"="Changes"
"2:"=""
"2:album"="Nevermind"
"2:artist"="Nirvana"
"2:song_id"="2de83..."
"2:title"="Breed"
"3:"=""
...
2ed91...
"1:"=""
"1:album"="Crystal
Ball"
"1:artist"="Styx"
"1:song_id"="837a2..."
"1:title"="Put
Me
On"
"2:"=""
...
...
12. Row Key Columns: "<song_order>:<column
name>"="<column
value>"
28d23...
"1:"=""
"1:album"="90125"
"1:artist"="Yes"
"1:song_id"="62c36..."
"1:title"="Changes"
"2:"=""
"2:album"="Nevermind"
"2:artist"="Nirvana"
"2:song_id"="2de83..."
"2:title"="Breed"
"3:"=""
...
2ed91...
"1:"=""
"1:album"="Crystal
Ball"
"1:artist"="Styx"
"1:song_id"="837a2..."
"1:title"="Put
Me
On"
"2:"=""
...
...
CQL Clustered Table (cont.)
CQL "Rows"
CREATE
TABLE
playlists
(
id
uuid,
song_order
int,
song_id
uuid,
//
copied
from
songs.id
title
text,
//
copied
from
songs.title
album
text,
//
copied
from
songs.album
artist
text,
//
copied
from
songs.artist
PRIMARY
KEY
(id,
song_order)
//
compound
key
);
13. Can we make Cassandra more appealing?
• Data Model
– No direct support for relationships
• Indexing
– Secondary indexes: single column only
– Hash table only: no range searching
• Searching
– No joins, embedded queries
– No aggregate queries
– Limited equalities (e.g., SELECT * WHERE <key> IN (<list>))
– No full text search
– No OR clauses
– ...
14. What is Doradus?
• Java service that enhances Cassandra
• Adds features:
– REST API (JSON and XML)
– Multi-tenancy
– Graph model
– Multi-field/full text query language
– Automatic data aging
– OLAP and Spider storage services
• Compatible with NoSQL tenets such as idempotent
updates
• Under development for ~3 years
• Open source: Apache 2.0 License
15. Doradus Graph Model
• A cluster hosts one of more applications
• An application own tables which store objects
• An object consists of single- and multi-valued fields
• A pair of link fields form a bi-directional relationship
Message
{Size, SendDate}
Participant
{ReceiptDate}
Address
{Name}
Person
{Name, Department}
Attachment
{Size, Extension} Managerè
çEmployees
êPerson
Address é
êAttachments
Messageé
Recipientsè
çMessageAsRecipient
Addressè
çParticipants
Senderè
çMessageAsSender
16. Example Object and Aggregate Queries
• Lucene full text query
GET
/Email/Person/_query?q=FirstName:j*
AND
NOT
Office:[q
TO
z]
• Link path with filtering
GET
/Email/Message/_query?q=
Sender.WHERE(ReceiptDate>'2010-‐06-‐01').Address.Name="*.com"
• Quantifiers
GET
/Email/Message/_aggregate?m=COUNT(*)
&q=ANY(Recipients).ALL(Address).NONE(Person).Department:sales
&f=Tags,TOP(3,TRUNCATE(SendDate,DAY))
• Transitive links
GET
/Email/Person/_query?q=DirectReports^(3).LastName=wilson
&f=DirectReports(Name,DirectReports(Name))
18. Doradus: Multi-Data Center Clusters
Cassandra
Doradus
Cassandra Cassandra
Doradus
Cassandra
Doradus
Cassandra Cassandra
Doradus
Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
Rack 1, Data Center 1 Rack 1, Data Center 2
Applications Applications
DC=2, N=6, RF=3
19. Doradus: Internal Architecture
App App App
Monitor
App
Spider
Storage Service
OLAP
Storage Service
Cassandra Cluster
JMX
REST: Embedded Jetty Server
Cassandra Interface
doradus.yaml
REST
20. Doradus OLAP Service
• Borrows from online analytical processing
– Sharding as data "cubes"
– Columnar storage
• Very dense storage
– No indexes!
– Value arrays are compressed
• Fast load time
– Up to 500,000 objects/second/node
– Small "data lag" time
• Very fast queries
– Searches millions of objects/second
– Full DQL object and aggregate query support
25. OLAP Use Case
• Data: Windows Events
– 115M events
• Test parameters
– Server: Quad Xeon CPUs, 32GB memory, 3 disks
– Cassandra memory: 1GB
– Load app/embedded Doradus memory: 4GB
– Load threads: 5
– Batch size: 5,000 events
– Shard size: 1 day (860 shards total)
• Test results
– Total objects loaded: ~1 billion
– Total time: 32 minutes, 56 seconds
– Load rate: 502,991 objects/second
– Final database size: ~2GB
26. Doradus Spider Service
• Analogous to Lucene + NoSQL
• Fully inverted field indexing
– Configurable analyzers
– Stored-only (non-indexed) fields
• Unique features:
– Automatic table-level sharding
– Statistics
– Pre-computed aggregate queries
– Refreshed in background
– Object-level data aging
• Use case example:
– Indexing a massive number of documents
27. OLAP and Spider: When to Use
• Spider is best for:
– Unstructured/variable-
structure data
– Configurable indexing
– Fine-grained updates with
immediate indexing
– Document storage and
searching
– Emphasis on full-text/multi-
field searching
• OLAP is best for:
– High-volume data streams
– High performance analytic
queries
– Dense data storage
– Immutable/semi-mutable
data
– Data that can be loaded in
batches
– Data that can be partitioned
(e.g., time-sharded)
28. Summary
• What's cool about Doradus?
– Bi-directional links with referential integrity
– Link paths: simpler than joins
– Idempotent updates
– Partial object updates
– Simple transitive searching
– OLAP: dense storage and fast queries
– It's free!
29. Thank you !
Doradus is available at:
https://github.com/dell-oss/Doradus
Contact me:
randy.guck@dell.software.com