Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...Cathrine Wilhelmsen
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours of PASS) (Presented at 24 HOurs of PASS Growing Our Community Edititon on June 25th 2015)
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...Cathrine Wilhelmsen
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours of PASS) (Presented at 24 HOurs of PASS Growing Our Community Edititon on June 25th 2015)
BIML is an XML-based language that allows us to completely model a BI solution . It's particularly interesting for the automatic creation of ETL processes , for which it can be used free of charge via the BIDS Helper, a free tool that should be known to all those who develop BI solutions with the platform Microsoft. In this session we will learn the basics and some advanced trick , how to use it and how it can help to significantly reduce the development time of an ETL solution and at the same time increase the quality
BIML is an XML-based language that allows us to completely model a BI solution . It's particularly interesting for the automatic creation of ETL processes , for which it can be used free of charge via the BIDS Helper, a free tool that should be known to all those who develop BI solutions with the platform Microsoft. In this session we will learn the basics and some advanced trick , how to use it and how it can help to significantly reduce the development time of an ETL solution and at the same time increase the quality
EnterJS 2015 - Continuous Integration for Frontend CodeMarcel Birkner
Continuous Integration gehört in den meisten Unternehmen mittlerweile zum guten Ton, oft aber nur mit Blick auf die Server-Seite. Dabei kommt heute keine Anwendung mehr ohne Frontend-Code aus. Dieser wird - gerade im Enterprise-Bereich - oft vernachlässigt. In diesem Vortrag wird ein CI-Setup mit Fokus auf das Frontend vorgestellt. Automatisierte Tests, Sourcecode Quality Management und andere typische Bestandteile sind für den Frontend-Code genauso wichtig wie für das Backend. Durch Asset-Optimierung kann die Performance, gerade für mobile Clients, deutlich gesteigert werden. Gezeigt werden Tools, Methoden und Best Practices für den Aufbau und die Umsetzung der Delivery-Pipeline. Zum Einsatz kommen dabei Grunt, ESLint, Mocha und Jenkins.
Continuous Integration and the Data Warehouse - PASS SQL Saturday SloveniaDr. John Tunnicliffe
Continuous integration is not normally associate with data warehouse projects due to the perceived complexity of implementation. John shows how modern tools make it simple to apply CI to the data warehouse. The session covers:
* The benefits of the SQL Server Data Tools declarative model
* Using PowerShell and psake to automate your build and deployments
* Implementing the TeamCity build server
* Integration and regression testing
* Auto-code generation within SSDT using T4 templates and DacFx
Continuous Integration and the Data Warehouse - PASS SQL Saturday SloveniaDr. John Tunnicliffe
Continuous integration is not normally associate with data warehouse projects due to the perceived complexity of implementation. John shows how modern tools make it simple to apply CI to the data warehouse. The session covers:
* The benefits of the SQL Server Data Tools declarative model
* Using PowerShell and psake to automate your build and deployments
* Implementing the TeamCity build server
* Integration and regression testing
* Auto-code generation within SSDT using T4 templates and DacFx
Waiting too long for Excel's VLOOKUP? Use SQLite for simple data analysis!Amanda Lam
** This workshop was conducted in the Hong Kong Open Source Conference 2017 **
Excel formulas can be quite slow when you're processing data files with thousands of rows. It's also especially difficult to maintain the files when you have some messy mixture of VLOOKUPs, Pivot Tables, Macros and VBAs.
In this interactive workshop targeted for non-coders, we will make use of SQLite, a very lightweight and portable open source database library, to perform some simple and repeatable data analysis on large datasets that are publicly available. We will also explore what you can further do with the data by using some powerful extensions of SQLite.
While SQLite may not totally replace Excel in many ways, after the workshop you will find that it can improve your work efficiency and make your life much easier in so many use cases!
Who should attend this workshop?
- If you're frustrated with the slow performance of Excel formulas when dealing with large datasets in your daily work
- No coding experience is required
The Battle of the Data Transformation Tools (PASS Data Community Summit 2023)Cathrine Wilhelmsen
The Battle of the Data Transformation Tools (Presented as part of the "Batte of the Data Transformation Tools" Learning Path at PASS Data Community Summit on November 16th, 2023)
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...Cathrine Wilhelmsen
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (Presented as part of the "Batte of the Data Transformation Tools" Learning Path at PASS Data Community Summit on November 15th, 2023)
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...Cathrine Wilhelmsen
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power BI (Presented at SQLSaturday Oregon & SW Washington on November 11th, 2023)
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (S...Cathrine Wilhelmsen
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (Presented at SQLBits on March 18th, 2023)
We all experience stress in our lives. When the stress is time-limited and manageable, it can be positive and productive. This kind of stress can help you get things done and lead to personal growth. However, when the stress stretches out over longer periods of time and we are unable to manage it, it can be negative and debilitating. This kind of stress can affect your mental health as well as your physical health, and increase the risk of depression and burnout.
The tricky part is that both depression and burnout can hit you hard without the warning signs you might recognize from stress. Where stress barges through your door and yells "hey, it's me!", depression and burnout can silently sneak in and gradually make adjustments until one day you turn around and see them smiling while realizing that you no longer recognize your house. I know, because I've dealt with both. And when I thought I had kicked them out, they both came back for new visits.
I don't have the Answers™️ or Solutions™️ to how to keep them away forever. But in hindsight, there were plenty of warning signs I missed, ignored, or was oblivious to at the time. In this deeply personal session, I will share my story of dealing with both depression and burnout. What were the warning signs? Why did I miss them? Could I have done something differently? And most importantly, what can I - and you - do to help ourselves or our loved ones if we notice that something is not quite right?
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced ...Cathrine Wilhelmsen
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced World (Presented at SQLBits on March 17th, 2023)
Do you sometimes think the world is moving so fast that you're struggling to keep up?
Does it make you feel a little uncomfortable?
Awesome!
That means that you have ambitions. You want to learn new things, take that next step in your career, achieve your goals. You can do anything if you set your mind to it.
It just might not be easy.
All growth requires some discomfort. You need to manage and balance that discomfort, find a way to push yourself a little bit every day without feeling overwhelmed. In a fast-paced world, you need to know how to break down your goals into smaller chunks, how to prioritize, and how to optimize your learning.
Are you ready to turn your "I can't keep up" into "I can't believe I did all of that in just one year"?
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...Cathrine Wilhelmsen
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing Startup (Presented at SQLBits on March 11th, 2022)
What happens when you mix one rapidly-changing startup, one data analyst, one data engineer, and one hypothesis that Azure Synapse Analytics could be the right tool of choice for gaining business insights?
We had no idea, but we gave it a go!
Our ambition was to think big, start small, and act fast – to deliver business value early and often.
Did we succeed?
Join us for an honest conversation about why we decided to implement Azure Synapse Analytics alongside Power BI, how we got started, which areas we completely messed up at first, what our current solution looks like, the lessons learned along the way, and the things we would have done differently if we could start all over again.
6 Tips for Building Confidence as a Public Speaker (SQLBits 2022)Cathrine Wilhelmsen
6 Tips for Building Confidence as a Public Speaker (Presented at SQLBits on March 10th, 2022)
Do you feel nervous about getting on stage to deliver a presentation?
That was me a few years ago. Palms sweating. Hands shaking. Voice trembling. I could barely breathe and talked at what felt like a thousand words per second. Now, public speaking is one of my favorite hobbies. Sometimes, I even plan my vacations around events! What changed?
There are no shortcuts to building confidence as a public speaker. However, there are many things you can do to make the journey a little easier for yourself. In this session, I share the top tips I have learned over the years. All it takes is a little preparation and practice.
You can do this!
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Session description
SSIS is a powerful tool for extracting, transforming and loading data, but
creating the actual SSIS packages can be both tedious and time-consuming.
Even if you use templates and follow best practices you often have to repeat
the same steps over and over again. There are no easy ways to handle
metadata and schema changes, and if there are new requirements you might
have to go through all the packages one more time.
It's time to bring the Don't Repeat Yourself principle to SSIS development. In
this session I will use the free BIDS Helper add-in to show you the basics of
Biml and BimlScript, how to generate SSIS packages automatically from
databases, how easy those packages can be changed, and how to move
common code to separate files that can be included where needed. See why
they say Biml allows you to complete in a day what once took more than a
week!
7. How can Biml help you?
Timesaving: Many SSIS
Packages from one Biml file
Reusable: Write once and run
on any platform (2005 – 2014)
Flexible: Start simple, expand
as you learn
(Of course I can create 200 packages!
What do you need me to do after lunch?)
8. What is Business Intelligence Markup Language?
Easy to read and write XML dialect
Specifies business intelligence objects
Databases, schemas, tables, columns
SSIS packages
SSAS cubes, facts, dimensions (Mist only)
9. Highlights in Biml History
Scott Currie works on Microsoft's Project Vulcan
2008: Varigence creates Biml and Mist
2011: Biml compiler added to BIDS Helper
(2015: Everyone wonders what we did before Biml?)
(Live Long And Prosper)
18. Getting started with Biml
1. Download and install BIDS Helper (http://bidshelper.codeplex.com)
2. Right-click on SSIS project and click Add New Biml File
29. The magic is in the
Extend Biml with C# or VB.NET code blocks
Import database structure and metadata
Loop over tables and columns
Add expressions to replace static values
(And anything else you can do in C# or VB)
37. foreach (table in a database) loop
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
<# var conAW2014 = SchemaManager.CreateConnectionNode("AW2014", "Data Source..."); #>
<# var AW2014DB = conAW2014.ImportDB("","", ImportOptions.ExcludeViews); #>
<Packages>
<# foreach (var table in AW2014DB.TableNodes) { #>
<Package Name="Load_<#=table.Schema#>_<#=table.Name#>">
</Package>
<# } #>
</Packages>
38. Don't Repeat Yourself
Move common code to separate files
Centralize and reuse in many projects
Update code once for all projects
1. Split and combine Biml files
2. Include files
3. CallBimlScript with parameters
39. Split and combine Biml files
Multiple Biml files can be compiled together
Control compile order by specifying tiers in files
<#@ template tier="2" #>
Files are compiled into RootNode from lowest to highest tier
Higher tiers can use objects in RootNode from lower tiers
40. Behind the scenes: compile and load objects into RootNode
RootNode
<#@ template tier="0" #>
<Connections>
<Databases>
<Schemas>
<#@ template tier="1" #>
<Tables>
<Columns>
<#@ template tier="2" #>
<Packages>
50. Split and combine multiple Biml files
Select all the tiered files
Right-click and click Generate SSIS
Packages
Behind the scenes: Objects will be
compiled and loaded into RootNode
from lowest to highest tier
51. Split and combine multiple Biml files
All packages will be generated at the same time
Load packages from 302LoadAllTables.biml
Master package from 303MasterPackage.biml
52. Include files
Include common code in multiple files and projects
Use the include directive
<#@ include file="CommonCode.biml" #>
Include directive will be replaced by content of file
Can include several file types: .biml .txt .sql .cs
54. CallBimlScript with parameters
Works like a parameterized include
File to be called (callee) specifies input parameters
<#@ property name="Parameter" type="String" #>
Callee can use parameter values as regular variables and to control logic
File that calls (caller) provides input parameters
<#=CallBimlScript("CommonCode.biml", Parameter)#>
CallBimlScript code block is replaced by Biml returned by callee
56. View compiled Biml
Credits: Marco Schreuder (@in2bi)
http://blog.in2bi.eu/biml/viewing-or-saving-the-
compiled-biml-file-s/
Helper file with high tier (tier="100")
Saves output of RootNode.GetBiml() to file
57. What do you do next?
1. Download BIDS Helper
2. Identify your SSIS patterns
3. Rewrite one SSIS package to Biml to learn the basics
4. Expand with BimlScript
5. Get involved in the Biml community