This document discusses the design and implementation of Tracer, a time series database at Stitch Fix that provides precise personalized inventory states at any given point in time. Tracer models inventory as a pure function of time by continuously capturing inventory state changes as delta blocks and periodically generating snapshots. It allows querying inventory counts at any time by applying the appropriate deltas to the nearest previous snapshot. Tracer is built on Spark and stores deltas and snapshots as Spark dataframes, providing Scala and Python APIs to query inventory states.
Pointers lesson 3 (data types and pointer arithmetics)SetuMaheshwari1
Explains what is Pointer Arithmetic. Also explain what is the significance of data types for pointers, also tells you why pointers always takes 2 bytes only.
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
At Uber we use high cardinality monitoring to observe and detect issues with our 4,000 microservices running on Mesos and across our infrastructure systems and servers. We’ll cover how we put the resulting 6 billion plus time series to work in a variety of different ways, auto-discovering services and their usage of other systems at Uber, setting up and tearing down alerts automatically for services, sending smart alert notifications that rollup different failures into individual high level contextual alerts, and more. We’ll also talk about how we accomplish all this with a global view of our systems with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, now available as an open source Prometheus long term storage backend, to horizontally scale our metrics platform in a cost efficient manner with a system that’s still sane to operate with petabytes of metrics data.
Dan Towner of ACCU Bristol & Bath, presenting at the Bristol IT MegaMeet 2013
This talk aims to demystify the clever parts of compilers that nobody ever told you about, explaining their inner secrets in simple terms. Come along to find out what induction variables do, what software pipelining is, how vectorisation works, how code scheduling is done, and how the debugger makes sense of it all.
See the video of the presentation here: http://www.youtube.com/watch?v=aeyf6wfxbL4
Pointers lesson 3 (data types and pointer arithmetics)SetuMaheshwari1
Explains what is Pointer Arithmetic. Also explain what is the significance of data types for pointers, also tells you why pointers always takes 2 bytes only.
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
At Uber we use high cardinality monitoring to observe and detect issues with our 4,000 microservices running on Mesos and across our infrastructure systems and servers. We’ll cover how we put the resulting 6 billion plus time series to work in a variety of different ways, auto-discovering services and their usage of other systems at Uber, setting up and tearing down alerts automatically for services, sending smart alert notifications that rollup different failures into individual high level contextual alerts, and more. We’ll also talk about how we accomplish all this with a global view of our systems with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, now available as an open source Prometheus long term storage backend, to horizontally scale our metrics platform in a cost efficient manner with a system that’s still sane to operate with petabytes of metrics data.
Dan Towner of ACCU Bristol & Bath, presenting at the Bristol IT MegaMeet 2013
This talk aims to demystify the clever parts of compilers that nobody ever told you about, explaining their inner secrets in simple terms. Come along to find out what induction variables do, what software pipelining is, how vectorisation works, how code scheduling is done, and how the debugger makes sense of it all.
See the video of the presentation here: http://www.youtube.com/watch?v=aeyf6wfxbL4
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
The world in which we monitor software is growing more complex every year. There are increasingly more ways to run server-side software, with many more independent services and more points of failures, the list goes on! On the plus side, there’s a lot of great tools and patterns being developed to try and make things simple to assess and understand. This talk covers how metrics and monitoring can be leveraged in a variety of different ways, auto-discovering applications and their usage of databases, caches, load balancers, etc, setting up and tearing down dashboards and monitoring automatically for services and instances, and more.
We’ll also talk about how you can accomplish all this with a global view of your systems using both Prometheus and Graphite with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, distributed aggregation with the M3 aggregator and the M3 Kubernetes operator to horizontally scale a metrics platform in a way that doesn’t cost outrageous amounts to run with a system that’s still sane to operate with petabytes of metrics data.
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Zuozhi Wang
Incremental processing is widely-adopted in many applications,
ranging from incremental view maintenance, stream computing,
to recently emerging progressive data warehouse and intermittent
query processing. Tempura is an optimizer that can be used in all such applications to find the optimal incremental plan.
Check the paper at http://www.vldb.org/pvldb/vol14/p14-wang.pdf
Tempura is open-source and built on top of Apache Calcite. https://github.com/alibaba/cost-based-incremental-optimizer
One of the limiting factors of most timeseries databases is that, in order to get good read performance, they limit your ability to update data. That's fine if your data is an event stream, but if its coming from a pre-aggregated sources it might update past data, for example data about online ad performance updated after click fraud is discovered. In this talk I'll show you how AdStage stores timeseries data in Postgres to allow fast reads and updates using clever schema design and functions for speed.
Balancing Infrastructure with Optimization and Problem FormulationAlex D. Gaudio
- How do we currently think about Data Science?
- Why is infrastructure important to our field?
- Two tools we've built on Sailthru's Data Science team to deal with these problems are "Stolos" and "Relay.Mesos".
C++ Is One Of The widely used programming language. Here is the complete presentation PPT notes of C++ programming language. hope it will be helpful to you.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
The world in which we monitor software is growing more complex every year. There are increasingly more ways to run server-side software, with many more independent services and more points of failures, the list goes on! On the plus side, there’s a lot of great tools and patterns being developed to try and make things simple to assess and understand. This talk covers how metrics and monitoring can be leveraged in a variety of different ways, auto-discovering applications and their usage of databases, caches, load balancers, etc, setting up and tearing down dashboards and monitoring automatically for services and instances, and more.
We’ll also talk about how you can accomplish all this with a global view of your systems using both Prometheus and Graphite with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, distributed aggregation with the M3 aggregator and the M3 Kubernetes operator to horizontally scale a metrics platform in a way that doesn’t cost outrageous amounts to run with a system that’s still sane to operate with petabytes of metrics data.
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Proces...Zuozhi Wang
Incremental processing is widely-adopted in many applications,
ranging from incremental view maintenance, stream computing,
to recently emerging progressive data warehouse and intermittent
query processing. Tempura is an optimizer that can be used in all such applications to find the optimal incremental plan.
Check the paper at http://www.vldb.org/pvldb/vol14/p14-wang.pdf
Tempura is open-source and built on top of Apache Calcite. https://github.com/alibaba/cost-based-incremental-optimizer
One of the limiting factors of most timeseries databases is that, in order to get good read performance, they limit your ability to update data. That's fine if your data is an event stream, but if its coming from a pre-aggregated sources it might update past data, for example data about online ad performance updated after click fraud is discovered. In this talk I'll show you how AdStage stores timeseries data in Postgres to allow fast reads and updates using clever schema design and functions for speed.
Balancing Infrastructure with Optimization and Problem FormulationAlex D. Gaudio
- How do we currently think about Data Science?
- Why is infrastructure important to our field?
- Two tools we've built on Sailthru's Data Science team to deal with these problems are "Stolos" and "Relay.Mesos".
C++ Is One Of The widely used programming language. Here is the complete presentation PPT notes of C++ programming language. hope it will be helpful to you.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
4. We provide personalized styling service through a combination of
algorithmic recommendations and stylist curation.
http://algorithms-tour.stitchfix.com/
What do we do at Stitch Fix?
5. We need good inventory to serve good recommendations.
Recommendation algorithms work in both ways.
(Buyers here mean the people who buy clothes from vendors to fill our warehouses)
Why does inventory matter?
Stylists Buyers
6. We need good personalized inventory to serve good
recommendation for each client.
Why does inventory matter?
7. We need good personalized inventory to serve good
recommendation for each client.
Tracer
A time series database providing precise personalized inventory
states at any given point of time
Why does inventory matter?
8. Imagine we have a time series of SKU counts
(count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
The design of Tracer
9. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate this series, say every 10 minutes.
The design of Tracer
10. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● This is asking too much! Let’s use a predefined interval to
generate counts, say every 10 minutes.
● Problems:
○ A tons of things can happen within 10 minutes during peak
hours
○ We’d like to know what exactly stylists saw when they
started working. 10-min snapshots just isn’t accurate
enough
The design of Tracer
11. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
The design of Tracer
12. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● OK, let’s generate the counts every second!
● Problems
○ Not realistic to aggregate that often in the engineering DB,
where every item is a row.
○ Even if eng maintains a count table, should we snapshot
that every 1 sec?
○ A waste of space for non-moving counts during midnight
The design of Tracer
13. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
The design of Tracer
14. (count1
, t1
), (count1
, t2
), (count1
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s do away with the fixed interval and only generate a count
event when the count changes!
○ Problems
■ Again, engineering DB works on item level
■ Say t1 is far away from t2, in order to know the count at tx
(t1
< tx
< t2
), we may need to walk through tons of other
events. This could be solved by indexing, but indexing
for each SKU is too much
The design of Tracer
15. (s11
-> s12
, t1
), (s12
-> s13
, t2
), (s13
-> s14
, t3
)...
Q: How could we know the count at any t within the range?
● Let’s tweak this idea a bit and generate events of item state
transitions
● This gives us the flexibility to process item state as we want. In
the case of computing SKU counts, we can transform these
events into SKU count changes:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The design of Tracer
16. (delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
Q: How could we know the count at any t within the range?
● One missing piece is we still need an initial state to apply a delta
● This can be addressed by creating a state snapshot at the very
beginning
The design of Tracer
17. Now the whole design can be summarized as two pure functions:
● Inventory state function
I(t)
● Difference function
D(t1
,t2
) = I(t2
) - I(t1
) = -D(t2
,t1
)
● Inventory state reasoning
I(t2
) = I(t1
) + D(t1
,t2
) = I(t3
) - D(t2
, t3
)
The design of Tracer
18. ● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
The implementation of Tracer
19. ● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
The implementation of Tracer
20. ● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
The implementation of Tracer
21. ● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
The implementation of Tracer
22. ● As we consume the item event stream, we continuously build
delta blocks:
(delta1
, t1
), (delta2
, t2
), (delta1
, t3
)...
● Periodically we create SKU count snapshot every hour, so that
we don’t need to always go to the very start to apply deltas all
the way from there
● To speed up searching for a certain snapshot, we index
snapshots. In the case of hourly snapshot, there are only 24
ones to index a day.
● This is all built upon Spark and deltas and snapshots are
stored as Spark dataframe
● We provide both Scala and Python API to query the inventory
state
The implementation of Tracer