1. The document discusses various technologies for building big data architectures, including NoSQL databases, distributed file systems, and data partitioning techniques.
2. Key-value stores, document databases, and graph databases are introduced as alternatives to relational databases for large, unstructured data.
3. The document also covers approaches for scaling databases horizontally, such as sharding, replication, and partitioning data across multiple servers.
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
Ashnik Database Solution Architect, Sameer Kumar, an Open Source evangelist presented at FOSSASIA 2015 about the features of open source database like PostgreSQL which are missed by developers stuck on proprietary databases.
10 Features you would love as an Open Source developer!
- New JSON Datatype
- Vast set of datatypes supported
- Rich support for foreign Data Wrap
- User Defined Operators
- User Defined Extensions
- Filter Based Indexes or Partial Indexes
- Granular control of parameters at User, Database, Connection or Transaction Level
- Use of indexes to get statistics
- JDBC API for COPY -Command
- Full Text Search
WiredTiger is rethinking data management for modern hardware with a focus on multi-core scalability and maximizing the value of every byte of RAM.
CPUs are no longer getting faster, and the cost of additional CPUs is approaching zero. Disk transfer speeds are relatively slower, compared to memory speeds, than a decade ago. Finally, power is the single biggest cost of the data center. For these reasons, WiredTiger is focused on more efficient use of I/O bandwidth, multiple CPUs and large memory in a single server.
With the latest release of MongoDB 3.0, we have announced several new exciting features, including our new pluggable storage API and the WiredTiger storage engine, which provides compression, improved concurrency control, and more. Learn how you will be able to take advantage of these new features and how they will improve your database performance with this upcoming webinar.
This is from a 2 hour talk introducing in-memory databases. First a look at traditional RDBMS architecture and some of it's limitations, then a look at some in-memory products and finally a closer look at OrigoDB, the open source in-memory database toolkit for NET/Mono.
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
Ashnik Database Solution Architect, Sameer Kumar, an Open Source evangelist presented at FOSSASIA 2015 about the features of open source database like PostgreSQL which are missed by developers stuck on proprietary databases.
10 Features you would love as an Open Source developer!
- New JSON Datatype
- Vast set of datatypes supported
- Rich support for foreign Data Wrap
- User Defined Operators
- User Defined Extensions
- Filter Based Indexes or Partial Indexes
- Granular control of parameters at User, Database, Connection or Transaction Level
- Use of indexes to get statistics
- JDBC API for COPY -Command
- Full Text Search
WiredTiger is rethinking data management for modern hardware with a focus on multi-core scalability and maximizing the value of every byte of RAM.
CPUs are no longer getting faster, and the cost of additional CPUs is approaching zero. Disk transfer speeds are relatively slower, compared to memory speeds, than a decade ago. Finally, power is the single biggest cost of the data center. For these reasons, WiredTiger is focused on more efficient use of I/O bandwidth, multiple CPUs and large memory in a single server.
With the latest release of MongoDB 3.0, we have announced several new exciting features, including our new pluggable storage API and the WiredTiger storage engine, which provides compression, improved concurrency control, and more. Learn how you will be able to take advantage of these new features and how they will improve your database performance with this upcoming webinar.
This is from a 2 hour talk introducing in-memory databases. First a look at traditional RDBMS architecture and some of it's limitations, then a look at some in-memory products and finally a closer look at OrigoDB, the open source in-memory database toolkit for NET/Mono.
Nuxeo JavaOne 2007 presentation (in original format)Stefane Fermigier
This session describes the architecture and implementation of an embeddable, extensible enterprise content management core for Java EE and simpler platforms. The presentation starts by describing the general architectural concepts used as building blocks:
• A schema and document model, reusing XML schemas and making good use of XML namespaces, where document types are built with several facets
• A repository model, using hierarchy and versioning, with the Content Repository API for Java (JSR 170) being one of the possible back ends
• A query model, based on the Java Persistence query language (JSR 220) and reusing the path-based concepts from Java Content Repositories (JCR)
• A fine-grained security model, compatible with WebDAV concepts and designed to provide flexible security policies
• An event model using synchronous and asynchronous events, allowing bridging through Java Message Service (JMS) or other systems to other event-enabled frameworks
• A directory model, representing access to external data sources using the same concepts as for documents but taking advantage of the specificities of the data back ends
Suitable abstraction layers are put in place to provide the required level of flexibility. One of the main architectural tasks is to find commonalities in all the systems used (or whose use is planned in the future) so framework users need to learn and use a minimal number of concepts. The result is a set of concepts that are fundamental to enterprise document management and are usable through direct Java technology-based APIs, Java EE APIs, or SOA. The presentation shows, for each of the main components, which challenges have been met and overcome when building a framework in which all components are designed to be improved and replaced by different implementations without sacrificing backward compatibility with existing ones.
The described implementation, Nuxeo Core, can be embedded in a basic Java technology-based framework based on OSGi (such as Eclipse) or in one based on Java EE, according to the needs of the application using it. This means that the core has to function without relying on Java EE services but also has to take advantage of them when they are available (providing clustering, messaging, caching, remoting, and advanced deployment).
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)Ortus Solutions, Corp
NoSQL document stores are reinventing the way we design our databases and cache layers. Couchbase server is a unique offering with unparalleled performance, automatic replication and failover. In this session, we'll talk about how to get started with Couchbase using the open source CFML SDK as well as native caching via the Railo Couchbase Extension.
WiredTiger is rethinking data management for modern hardware with a focus on multi-core scalability and maximizing the value of every byte of RAM.
CPUs are no longer getting faster, and the cost of additional CPUs is approaching zero. Disk transfer speeds are relatively slower, compared to memory speeds, than a decade ago. Finally, power is the single biggest cost of the data center. For these reasons, WiredTiger is focused on more efficient use of I/O bandwidth, multiple CPUs and large memory in a single server.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
For years, the common industry perception has been that MySQL is faster and easier to use than PostgreSQL. PostgreSQL is perceived as more powerful, more focused on data integrity, and stricter at complying with SQL specifications, but correspondingly slower and more complicated to use.
Like many perceptions formed in the past, these things aren\'t as true with the current generation of releases as they used to be.
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
NoSQL data stores have emerged for scalable capture and real-time analysis of data. Apache Spark and Hadoop provide additional scalable analytics processing. This session looks at these technologies and how they can be used to support operational analytics to improve operational effectiveness. It also looks at an example of how operational analytics can be implemented in NoSQL environments using the Basho Data Platform with Apache Spark:
•The emergence of NoSQL, Hadoop and Apache Spark
•NoSQL Use Cases
•The need for operational analytics
•Types of operational analysis
•Key requirements for operational analytics
•Operational analytics using the Basho Data Platform with Apache Spark.
Nuxeo JavaOne 2007 presentation (in original format)Stefane Fermigier
This session describes the architecture and implementation of an embeddable, extensible enterprise content management core for Java EE and simpler platforms. The presentation starts by describing the general architectural concepts used as building blocks:
• A schema and document model, reusing XML schemas and making good use of XML namespaces, where document types are built with several facets
• A repository model, using hierarchy and versioning, with the Content Repository API for Java (JSR 170) being one of the possible back ends
• A query model, based on the Java Persistence query language (JSR 220) and reusing the path-based concepts from Java Content Repositories (JCR)
• A fine-grained security model, compatible with WebDAV concepts and designed to provide flexible security policies
• An event model using synchronous and asynchronous events, allowing bridging through Java Message Service (JMS) or other systems to other event-enabled frameworks
• A directory model, representing access to external data sources using the same concepts as for documents but taking advantage of the specificities of the data back ends
Suitable abstraction layers are put in place to provide the required level of flexibility. One of the main architectural tasks is to find commonalities in all the systems used (or whose use is planned in the future) so framework users need to learn and use a minimal number of concepts. The result is a set of concepts that are fundamental to enterprise document management and are usable through direct Java technology-based APIs, Java EE APIs, or SOA. The presentation shows, for each of the main components, which challenges have been met and overcome when building a framework in which all components are designed to be improved and replaced by different implementations without sacrificing backward compatibility with existing ones.
The described implementation, Nuxeo Core, can be embedded in a basic Java technology-based framework based on OSGi (such as Eclipse) or in one based on Java EE, according to the needs of the application using it. This means that the core has to function without relying on Java EE services but also has to take advantage of them when they are available (providing clustering, messaging, caching, remoting, and advanced deployment).
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)Ortus Solutions, Corp
NoSQL document stores are reinventing the way we design our databases and cache layers. Couchbase server is a unique offering with unparalleled performance, automatic replication and failover. In this session, we'll talk about how to get started with Couchbase using the open source CFML SDK as well as native caching via the Railo Couchbase Extension.
WiredTiger is rethinking data management for modern hardware with a focus on multi-core scalability and maximizing the value of every byte of RAM.
CPUs are no longer getting faster, and the cost of additional CPUs is approaching zero. Disk transfer speeds are relatively slower, compared to memory speeds, than a decade ago. Finally, power is the single biggest cost of the data center. For these reasons, WiredTiger is focused on more efficient use of I/O bandwidth, multiple CPUs and large memory in a single server.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
For years, the common industry perception has been that MySQL is faster and easier to use than PostgreSQL. PostgreSQL is perceived as more powerful, more focused on data integrity, and stricter at complying with SQL specifications, but correspondingly slower and more complicated to use.
Like many perceptions formed in the past, these things aren\'t as true with the current generation of releases as they used to be.
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
NoSQL data stores have emerged for scalable capture and real-time analysis of data. Apache Spark and Hadoop provide additional scalable analytics processing. This session looks at these technologies and how they can be used to support operational analytics to improve operational effectiveness. It also looks at an example of how operational analytics can be implemented in NoSQL environments using the Basho Data Platform with Apache Spark:
•The emergence of NoSQL, Hadoop and Apache Spark
•NoSQL Use Cases
•The need for operational analytics
•Types of operational analysis
•Key requirements for operational analytics
•Operational analytics using the Basho Data Platform with Apache Spark.
A brief historical survey of how programming languages have evolved over the decades. We revisit several milestones along the way, reminding ourselves of a few of the missed opportunities. We examine the broad families into which programming languages fall -- an informal phylogenetic tree. We try to recognize the convergence of features among several mainstream languages. Finally, we discuss the current state of affairs in the world of programming languages.
Systems Integration in the NoSQL Era with Apache Camel (Neo4j, CouchDB, AWS S...Kai Wähner
SQL cannot solve several problems emerging with big data. In the future you will have to integrate NoSQL databases, too. The open source integration framework Apache Camel is already prepared for this challenging task. Several examples are shown for integrating NoSQL databases from CouchDB (Document Store), HBase (Column-oriented), Neo4j (Graph), Amazon Web Services (Key Value Store), and others.
Here we describe how to "think" mapreduce not just "code" mapreduce. We solve some interesting problems using mapreduce (e.g. how to compute similarity between all pair of documents on the web, how to do k-means clustering using map-reduce, and how to find cliques in a graph using map-reduce). These solutions are simple, elegant, and open up new ways for people to actually use mapreduce more than just simple number crunching.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
Businesses are generating and ingesting an unprecedented volume of structured and unstructured data to be analyzed. Needed is a scalable Big Data infrastructure that processes and parses extremely high volume in real-time and calculates aggregations and statistics. Banking trade data where volumes can exceed billions of messages a day is a perfect example.
Firms are fast approaching 'the wall' in terms of scalability with relational databases, and must stop imposing relational structure on analytics data and map raw trade data to a data model in low latency, preserve the mapped data to disk, and handle ad-hoc data requests for data analytics.
Joe discusses and introduces NoSQL databases, describing how they are capable of scaling far beyond relational databases while maintaining performance , and shares a real-world case study that details the architecture and technologies needed to ingest high-volume data for real-time analytics.
For more information, visit www.casertaconcepts.com
From self-driving cars to Siri and Watson, applications of artificial intelligence and machine learning are all around us. Broadly speaking, any problem which a computer has to learn to solve using only data about the problem domain comes under the purview of machine learning, spam filters for instance. Understanding the how and why of these fascinating technologies and the impact they will have on our society is critical to utilizing them in ways beneficial to all of humanity.
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
Real Time Interactive Queries IN HADOOP: Big Data Warehousing MeetupCaserta
During the Big Data Warehousing Meetup, we discussed options for enabling real-time/interactive queries to support business intelligence type functionality on Hadoop. Also, Hortonworks provided a deep-dive demo of Stinger! You can access that slideshow here: http://www.slideshare.net/CasertaConcepts/stinger-initiative-hortonworks
If you would like more information, please don't hesitate to contact us at info@casertaconcepts.com. Or, visit our website at http://casertaconcepts.com/.
Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.
NoSQL? No, SQL! - SQL, the underestimated "Big Data" technologyDataGeekery
In the past decade, RDBMS related traction has moved away completely from SQL towards JPA / JPQL, or even further, towards NoSQL. Evangelists have widely agreed that RDBMS are not "web scale", even if the race is far from being decided.
In this talk, I want to show you how many features you're missing out on, when you don't do real SQL. When you don't take advantage of recent SQL standard evolutions, such as SQL:1999 hierarchical SQL, SQL:2003 window functions, or many vendor specific extensions. In an example session, we're going to look at how we can calculate running totals on medium-sized data sets using
- nested selects
- window functions
- hierarchical SQL
- the Oracle MODEL clause
- stored functions
And most importantly, we're going to see how the above can help us increase performance while we decrease the number of lines of code when using any of MyBatis, jOOQ, or SpringJDBC.
RWDG Webinar: The New Non-Invasive Data Governance FrameworkDATAVERSITY
Non-Invasive Data Governance is summarized as the practice of formalizing accountability for data and the application of governance to process. Non-Invasive Data Governance describes how data governance is applied to the organization rather than being forced into the environment. A NIDG framework will be introduced in this webinar.
In this month’s installment of the RWDG webinar series, Bob Seiner will present a new data governance framework that addresses the core components of data governance for each level of the organization. The resulting framework can be used for all approaches to data governance.
In this webinar Bob will discuss:
- The five core components of a data governance effort
- The five levels where the core components will be addressed
- Detailed explanation of each component for each level
- A diagram to complete the framework for your organization
- A framework comparison across approaches
LDM Slides: How Data Modeling Fits into an Overall Enterprise ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as it relates to data and its business impact across the organization.
Join this webinar for a discussion on how a data model can be combined with an overall enterprise architecture for enhanced business value and success.
GIDS 2016 Understanding and Building No SQLstechmaddy
Storage becomes the key part of any Big Data system. There are few non-functional parameters that are expected from the Big Data storage systems like reliability, horizontal scalability, high availability, fault tolerance, etc. To support these properties and the change of data storage and access patterns in Big Data systems lead to a class of storage - NoSQLs. If there’s one rule in design -- there will always be trade-offs. CAP theorem defines the choices that we can make with the trade-offs. And ACID rules change to BASE in NoSQLs.
This talk focuses on understanding NoSQLs, the design decisions for designing NoSQL databases, an complete design example of key-value database, and patterns of replication and sharding.
This is an exam cheat sheet hopes to cover all keys points for GCP Data Engineer Certification Exam
Let me know if there is any mistake and I will try to update it
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top?
In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
(Berkeley CS186 guest lecture)
Big Data Analytics Systems: What Goes Around Comes Around
Introduction to MapReduce, GFS, HDFS, Spark, and differences between "Big Data" and database systems.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
What is Distributed Computing, Why we use Apache SparkAndy Petrella
In this talk we introduce the notion of distributed computing then we tackle the Spark advantages.
The Spark core content is very tiny because the whole explanation has been done live using a Spark Notebook (https://github.com/andypetrella/spark-notebook/blob/geek/conf/notebooks/Geek.snb).
This talk has been given together by @xtordoir and myself at the University of Liège, Belgium.
Give you a brief overview of the product. - What is esProc SPL? And show some cases helping you to know what it uses for. Talk about why esProc works better. And overview its brief characteristics. After that, Introduce the main technical solutions which esProc is often used.
Architectural anti-patterns for data handlingGleicon Moraes
Now with three more anti patterns and a new required listening. This is the Discipline release, all hail to King Crimson and Fripp's care with details.
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
by Darin Briskman, Technical Evangelist, AWS
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms. Level: 200
Massive Data Processing in Adobe Using Delta LakeDatabricks
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used
(String FTW)
Data Manipulation using UDFs
Writer Worries and How to Wipe them Away
Staging Tables FTW
Datalake Replication Lag Tracking
Performance Time!
Similar to Understanding and building big data Architectures - NoSQL (20)
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
7. Latency
● Hibernia Express
● 3,000-mile fiber-optic
● across the Atlantic Ocean to connect London to New York
● goal for 5ms latency
● To be used by Financial Institutes for trading
Src: http://shop.oreilly.com/product/0636920028048.do
22. CAP theorem
Strong Consistency, High Availability, and Partition-Tolerance
Img-src:http://image.slidesharecdn.com/cap-131117230434-phpapp02/95/dynamo-and-bigtable-in-light-of-the-cap-theorem-12-638.jpg?cb=1384729712
29. BASE
● Basically Available: If a single node fails, part of the
data won't be available, but the entire data layer stays
operational.
● Soft state: Soft state means data that is not persisted
on the disk, yet in case of failure it could be possible to
restore it.
● Eventually consistent: indicates that the system will
become consistent over time, given that the system
doesn't receive input during that time.
33. Implementation 2 - Associative Arrays
Key Value
user1 Mike
user2 Mary
user3 Nina
On hotspace
34. Simple Storage Design
- put key value - will add content to file in one line
- get key - will grep for key and return the value from the
file
What are the problems with this?
Activity
How can we improve this?
35. Simple Storage Design
- Add in memory index, with key and value as byte offset.
What are the problems with this?
Activity
How can we improve this?
36. Simple Storage Design
- Segments
- Compaction
What are the problems with this?
Activity
How can we improve this?
37. Simple Storage Design
- Sorted Key-Value
- Sparse index
- SSTable
What are the problems with this?
Activity
How can we improve this?
39. Simple Storage Design - Overall
- Writes into RedBlack or AVL trees in memory - memtable
=> faster writes
- When memtable is 64MB, write to disk as SSTable and
clean memtable
- First read from memtable and most recent segments in-
memory sparse index (SSTable) => faster reads
- Run a merging and compaction process in the background
=> lesser storage and faster
50. Why?
❏ Modelling and storing relationships in RDBMS is
complicated
❏ Performance degrades with number and levels of
relationships.
❏ Query complexity grows
❏ Adding new type requires schema redesign
51. With neo4j, you can traverse 4M+ relationships
per second and core
63. When is this better?
❏ Huge number of columns, with queries on few columns
❏ Aggregation
❏ Column level update
❏ Column data is uniform; so better compression
64. Time Series Data
Measurement and Time of measurement done repeatedly
Img src: https://www.safaribooksonline.com/library/view/time-series-databases/9781491920909/images/tsdn_0103.png.jpg
66. When - Time Series
Data
● Huge amount of data
● Mostly query based on time
● Stock exchange
● Sensor data. E.g: Trucks
● Cell towers for usage patterns
68. Replication
This is useful when you have a ncie photo or color-black as a
background. On this slide only, you can put your elements behind
a master element.
81. Federated
Tables
A Federated Table is a table which points
to a table in another database instance
(mostly on an other server). It can be
seen as a view to this remote database
table.
- Administration overhead
- Security
- Access over network
- Okay for reporting/analytical tasks
82. Federated
Tables
A Federated Table is a table which points
to a table in another database instance
(mostly on an other server). It can be
seen as a view to this remote database
table.
- Administration overhead
- Security
- Access over network
- Okay for reporting/analytical tasks
84. Hash based Take hash of key and modulo operation,
put the data in the server based on
reminder value.
- Uniform distribution
- Range queries may take time
85. Co-ordinators
- Take request, if key is in the request, talk to correct shard
- Co-ordinate across shards to give the result back
- Monitor health
- Take care of rebalancing
- Can be a random node, which will complete the task
- Set of co-ordinators
86. Take care, while sharding
● Balance your shards, with proper shard key
● Choose correct number of shards. E.g: 12
● Give time for rebalancing. In case of increasing capacity of
server, add nodes faster, and give time move your shards.
● Shard on denormalized data.
● Try to have shard key as part of your queries.