This document discusses advanced algorithm design and analysis techniques including dynamic programming, greedy algorithms, and amortized analysis. It provides examples of dynamic programming including matrix chain multiplication and longest common subsequence. Dynamic programming works by breaking problems down into overlapping subproblems and solving each subproblem only once. Greedy algorithms make locally optimal choices at each step to find a global optimum. Amortized analysis averages the costs of a sequence of operations to determine average-case performance.
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
3. • Dynamic Programming is a technique for algorithm design.
• It is a tabular method in which it uses divide-and-conquer
to solve problems.
• dynamic programming is applicable when the subproblems
are not independent.
• So to solve a given problem, we need to solve different
parts of the problem.
What is Dynamic Programming?
4. Advantage
• A dynamic-programming algorithm solves every
subproblem just once and then saves its answer in a
table, thereby avoiding the work of recomputing the
answer every time the subproblem is encountered.
• Dynamic programming is typically applied to
optimization problems. In such problems there can be
many possible solutions. Each solution has a value,
and we wish to find a solution with the optimal
(minimum or maximum) value.
5. DP algorithm can be broken into a
sequence of four steps.
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution in a
bottom-up fashion.
4. Construct an optimal solution from computed
information
6. Algorithms
• Elements of dynamic programming
Optimal substructure
Overlapping subproblems
Memoization
Example :
• Matrix-chain multiplication
• Longest common subsequence
7. Elements of dynamic programming
• Optimal substructure
if an optimal solution to the problem contains
within it optimal solutions to subproblems.
Whenever a problem exhibits optimal
substructure, it is a good clue that dynamic
programming might apply.
8. Overlapping subproblems
• Dynamic-programming algorithms typically
take advantage of overlapping subproblems
by solving each subproblem once and then
storing the solution in a table where it can be
looked up when needed, using constant time
per lookup.
10. Memoization
• offers the efficiency of the usual dynamic-
programming approach while maintaining a top-down
strategy.
• A memoized recursive algorithm maintains an entry
in a table for the solution to each subproblem. Each
table entry initially contains a special value to
indicate that the entry has yet to be filled in.
• When the subproblem is first encountered during the
execution of the recursive algorithm, its solution is
computed and then stored in the table.
• Each subsequent time that the subproblem is
encountered, the value stored in the table is simply
looked up and returned.
11. Matrix-chain multiplication
• Dynamic programming is an algorithm that solves
the problem of matrix-chain multiplication. We
are given a sequence (chain) A1, A2, . . ., An of n
matrices to be multiplied, and we wish to
compute the product A1 .A2. ……. A
n
.
• We can evaluate the expression using the standard
algorithm for multiplying pairs of matrices as a
subroutine once we have parenthesized it to
resolve all ambiguities in how the matrices are
multiplied together.
• A product of matrices is fully parenthesized if it
is either a single matrix or the product of two
fully parenthesized matrix products, surrounded
by parentheses
12. For example,
• if the chain of matrices is A1, A2, A3, A4 , the
product A1A2A3A4 can be fully parenthesized in
five distinct ways:
(A1(A2(A3A4))) ,
• (A1((A2A3)A4)) ,
• ((A1A2)(A3A4)) ,
• ((A1(A2A3))A4) ,
• (((A1A2)A3)A4) .
13. matrix-chain multiplication problem
• given a chain A1, A2, . . . , An of n matrices,
where for i = 1, 2, . . . , n , matrix Ai has
dimension pi - 1 X pi, fully parenthesize the
product A1 A2...An in a way that minimizes the
number of scalar multiplications.
14. Steps to be followed
• Structure of an optimal parenthesization
• A recursive solution
• Computing the optimal costs
15. Algorithm
MATRIX-CHAIN-ORDER(p)
1 n length[p] - 1
2 for i 1 to n
3 do m[i, i] 0
4 for l 2 to n
5 do for i 1 to n - l + 1
6 do j i + l-1
7 m[i, j] ∞
8 for k i to j - 1
9 do q m[i, k] + m[k + 1, j] +pi-1pkpj
10 if q < m[i, j]
11 then m[i, j] q
12 s[i, j] k
13 return m and s
16. The m and s tables computed by MATRIX-CHAIN-ORDER for n =
6 and the following matrix dimensions
matrix dimension
-----------------
A1 30 X 35
A2 35 X 15
A3 15 X 5
A4 5 X 10
A5 10 X 20
A6 20 X 25
17. Longest common subsequence
• A subsequence of a given sequence is just the
given sequence with some elements (possibly
none) left out
• X = (x1, x2, . . . , xm ), another sequence Z =( z1, z2,
. . . , zk )is a subsequence of X if there exists a
strictly increasing sequence (i1, i2, . . . , ik )of
indices of X such that for all j = 1, 2, . . . , k, we
have xij = zj.
• For example, Z = <B, C, D, B >is a subsequence
of X =< A, B, C, B, D, A, B> with corresponding
index sequence <2, 3, 5, 7 >
18. Example
• Given two sequences X and Y, we say that a
sequence Z is a common subsequence of X
and Y if Z is a subsequence of both X and Y.
For example, if X = <A, B, C, B, D, A, B> and
Y = <B, D, C, A, B, A , >
19. Theorem
• Let X = <x1, x2, . . . , xm >and Y = <y1, y2, . . . , yn
> be sequences, and let Z = <z1, z2, . . . , zk >be
any LCS of X and Y.
1. If xm = yn, then zk = xm = yn and Zk-l is an LCS of
Xm-l and Yn-l.
2. If xm ≠yn, then zk ≠xm implies that Z is an LCS of
Xm-1 and Y.
3. If xm ≠yn, then zk ≠yn implies that Z is an LCS of
X and Yn-l.
21. Computing the length of an LCS
• LCS-LENGTH on the sequences X = <A, B, C, B, D,
A, B> and Y = <B, D, C, A, B, A> .
• The square in row i and column j contains the value of
c[i, j] and the appropriate arrow for the value of b[i, j].
• The entry 4 in c[7, 6]--the lower right-hand corner of
the table--is the length of an LCS B, C, B, A of X and
Y.
• For i, j > 0, entry c[i, j] depends only on whether xi = yj
and the values in entries c[i -1, j], c[i, j - 1], and c[i - 1,
j - 1], which are computed before c[i, j].
• To reconstruct the elements of an LCS, follow the b[i, j]
arrows from the lower right-hand corner; the path is
shaded. Each " on the path corresponds to an entry
(highlighted) for which xi = yj is a member of an LCS.
22. Algorithm
LCS-LENGTH(X,Y)
1 m length[X]
2 n length[Y]
3 for i 1 to m
4 do c[i,0] 0
5 for j 0 to n
6 do c[0, j] 0
7 for i 1 to m
8 do for j 1 to n
9 do if xi = yj
10 then c[i, j] c[i - 1, j -1] + 1
11 b[i, j] " ↖"
12 else if c[i - 1, j] ≥c[i, j - 1]
13 then c[i, j] c[i - 1, j]
14 b[i, j] " ↑“
15 else c[i, j] c[i, j -1]
16 b[i, j] " "
17 return c and b
24. Greedy Algorithm
• A greedy algorithm always makes the choice
that looks best at the moment. That is, it
makes a locally optimal choice in the hope
that this choice will lead to a globally optimal
solution.
25. Applications
• Nontrivial problem,
• The activity-selection problem,
• Data-compression (Huffman) codes.
• problem of scheduling unit-time tasks with
deadlines and penalties.
• The greedy method is quite powerful and works
well for a wide range of problems.
• minimum-spanning-tree algorithms
• Dijkstra's algorithm for shortest paths form a
single source
• Minimum spanning trees
26. activity-selection problem
• Suppose we have a set S = { 1, 2, . . . , n} of n proposed
activities that wish to use a resource, such as a lecture
hall, which can be used by only one activity at a time.
• Each activity i has a start time si
• finish time âi, where si ≤ âi. If selected, activity i takes
place during the half-open time interval [si,âi).
• Activities i and j are compatible if the intervals [si, âi)
and [sj,âj) do not overlap (i.e., i and j are compatible if
si ≥âj or sj ≥âi).
• The activity-selection problem is to select a maximum-
size set of mutually compatible activities.
29. 0-1 knapsack problem
• A thief robbing a store finds n items; the ith item
is worth vi dollars and weighs wi pounds, where vi
and wi are integers.
• He wants to take as valuable a load as possible,
but he can carry at most W pounds in his
knapsack for some integer W.
• What items should he take?
• (This is called the 0-1 knapsack problem because
each item must either be taken or left behind; the
thief cannot take a fractional amount of an item
or take an item more than once.)
30. fractional knapsack problem
• the setup is the same, but the thief can take
fractions of items, rather than having to make
a binary (0-1) choice for each item.
• You can think of an item in the 0-1 knapsack
problem as being like a gold ingot, while an
item in the fractional knapsack problem is
more like gold dust.
31. Solution
(a)The thief must select a subset of the three items shown whose weight must
not exceed 50 pounds.
(b)The optimal subset includes items 2 and 3. Any solution with item 1 is
suboptimal, even though item 1 has the greatest value per pound.
(c) For the fractional knapsack problem, taking the items in order of greatest
value per pound yields an optimal solution.
32. AMORTIZED ANALYSIS
• In an amortized analysis, the time required to
perform a sequence of data-structure
operations is averaged over all the operations
performed.
• Amortized analysis can be used to show that
the average cost of an operation is small, if one
averages over a sequence of operations, even
though a single operation might be expensive.
• Amortized analysis differs from average-case
analysis in that probability is not involved; an
amortized analysis guarantees the average
performance of each operation in the worst
case.
33. Importance of AA
• we want to analyze the complexity of
performing a sequence of operations on a
particular data structure. In some cases,
knowing the complexity of each operation in
the sequence is important, so we can simply
analyze the worst-case complexity of each
operation.
• In other cases, only the time complexity for
processing the entire sequence is important.
34. Definition of Worst case sequence
complexity
• "worst-case sequence complexity" of a
sequence of m operations as the MAXIMUM
total time over all sequences of m operations
T(m) = worst-case cost of doing a sequence of m
operations
35. Definition of amortized sequence
complexity
• amortized sequence complexity of a sequence of m
operations is defined as follows:
amortized worst-case sequence complexity
Sequence = ---------------------------------------------
m
Amortized analyses make more sense than a plain
worst-case time analysis in many situation
36. Three Most Common Techniques Used
In Amortized Analysis
• Aggregate Method
• Accounting Method
• Potential Method
37. Aggregate Method
• In the aggregate method of amortized
analysis, we show that for all n, a sequence of
n operations takes worst-case time T(n) in
total. In the worst case, the average cost, or
amortized cost, per operation
• therefore T(n) / n.
38. The accounting method
• In the accounting method of amortized
analysis, we assign differing charges to
different operations, with some operations
charged more or less than they actually cost.
The amount we charge an operation is called
its amortized cost.
• When an operation's amortized cost exceeds
its actual cost, the difference is assigned to
specific objects in the data structure as credit.
• Credit can be used later on to help pay for
operations whose amortized cost is less than
their actual cost
39. The potential method
• Instead of representing prepaid work as credit
stored with specific objects in the data
structure, the potential method of amortized
analysis represents the prepaid work as
"potential energy,"or just "potential," that can
be released to pay for future operations.
• The potential is associated with the data
structure as a whole rather than with specific
objects within the data structure.
40. potential function
A potential function maps each data structure D
to a real number(Di), which is the potential
associated with data structure Di.
The amortized cost of the ith operation with
respect to potential function