CloudFixer and MCG Training have concocted a 7-Step Master Cleanse for Salesforce data that they shared via webinar on Tuesday, March 19th at 1 PM EST. Luckily, there are no lemons, maple syrup or cayenne pepper involved!
You’re the perfect data cleansing candidate if you:
- Are worried that Salesforce, while very powerful, can also be costly and time consuming. We want to show you how it can be done easily and inexpensively.
- Need the right arguments for investing in data quality.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data product thinking-Will the Data Mesh save us from analytics historyRogier Werschkull
Data Mesh: What is it, for Who, for who definitely not?
What are it's foundational principles and how could we take some of them to our current Data Analytical Architectures?
Data stewards are the implementation arm of Data Governance. They are also the first line of defense against bad data practices. Whether it’s data profiling or in-depth root cause analysis, data stewards ensure the organization’s shared data is reliably interconnected. Whether starting or restarting your Data Stewardship program, success comes from:
- Understanding the cadence/role of foundational data practices supporting organizational operations
- Proving value with tangible ROI
- Improving effectiveness/efficiencies using organization-wide insight
- Comprehending how stewards need to be multifunctional and dexterous, especially at first
- Integrating the role of data debt fighting
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Data product thinking-Will the Data Mesh save us from analytics historyRogier Werschkull
Data Mesh: What is it, for Who, for who definitely not?
What are it's foundational principles and how could we take some of them to our current Data Analytical Architectures?
Data stewards are the implementation arm of Data Governance. They are also the first line of defense against bad data practices. Whether it’s data profiling or in-depth root cause analysis, data stewards ensure the organization’s shared data is reliably interconnected. Whether starting or restarting your Data Stewardship program, success comes from:
- Understanding the cadence/role of foundational data practices supporting organizational operations
- Proving value with tangible ROI
- Improving effectiveness/efficiencies using organization-wide insight
- Comprehending how stewards need to be multifunctional and dexterous, especially at first
- Integrating the role of data debt fighting
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Nguyen Vu Hung - Software Project Management with Jira AgileVu Hung Nguyen
Biography:
Nguyen Vu Hung is the CLO of Septeni Technology, a development center of Tokyo based Septeni Group that focuses on developing and operating, mostly, web-based online advertisement systems. He has numerous years of IT and software development, project/product management in both Japan and Vietnam. Considering himself as a FOSS and Agile evangelist and being a Agile lover and an CLO, he is also interested in not-so-related domains such as human resource management and (organization) (re)structuring. Hung is interested in: – Agile/Scrum and the alikes – Open Source – Project Management
Software project management with Jira Agile:
In this workshop, I will share hand-on experience on how using Jira Agile to manage project in Agile/Scrum ways. The workshop will guide you:
– How to create and manage your product backlog, sprints backlog using Confluence
– How to manage sprint backlog using Confluence, link it with JIRA
– How to manage daily tasks and stories in JIRA
– Using Scrum board, Epic
– Make Sprint report, Velocity chart
– Using Planning and Estimating
Goal of this session:
Master Scrum Artifacts using JIRA
References:
http://agiletourvietnam.org/speakers/
http://agiletourvietnam.org/speaker/nguyen-vu-hung/
http://agiletourvietnam.org/session/software-project-management-with-jira-agile/
Creating a Data validation and Testing StrategyRTTS
Creating A Data Validation & Testing Strategy
Are you struggling with formulating a strategy for how to validate the massive amount of data continuously entering your data warehouse or data lake?
We can help you!
Learn how RTTS’ Data Validation Assessment provides:
- an evaluation of your current data validation process
- recommendations on how to improve your process and
- a proposal for successful implementation
This slide deck addresses the following issues:
- How do I find out if I have bad data?
- How do I ensure I am testing the proper data permutations?
- How much of my data needs to be validated and automated?
- Which critical data endpoints need to be tested?
- How do I test data in my cloud environments?
And much more!
For more information, visit:
https://www.rttsweb.com/services/solutions/data-validation-assessment
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
Delta Lake is an open source framework living on top of parquet in your data lake to provide Reliability and performances. It has been open-sourced by Databricks this year and is gaining traction to become the defacto delta lake format.
We’ll see all the goods Delta Lake can do to your data with ACID transactions, DDL operations, Schema enforcement, batch and stream support etc !
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
DAMA Australia: How to Choose a Data Management ToolPrecisely
The explosion of data types, sources, and use cases makes it difficult to make the right decisions around the best data management tools for your organisation. Why do you need them? Who is going to use them? What is their value?
Watch this webinar on-demand to learn how to demystify the decision making process for the selection of Data Management Tools that support:
· Data governance
· Data quality
· Data modelling
· Master data management
· Database development
· And more
A business-friendly approach to data governance is imperative to engage all users and accommodate diverse business use cases spanning analytics, operational improvements, and compliance requirements. To increase adoption and collaboration, business and technical data users across your organisation need to have a common, agreed-upon, and documented understanding of which data is most important, what it’s called, and where it’s used.
Watch this on-demand webinar, where we explore the concept of business-first Data Governance, an approach that promotes adoption by the organisation, lays the foundation for data integrity and consistently delivers business value in the long term.
We also look at how Oripharm, one of the dynamic healthcare players in the Nordics and international markets, choose a data governance solution:
• to improve personalisation of products and services
• to achieve accurate and timely credit-risk analysis
• to increase user productivity by improving time-to-insights
• to mitigate risk and facilitate regulatory compliance and reporting
Speakers:
Mikkel Holmgaard - Data Governance Lead, Orifarm
Emily Washington - Sr. Vice President, Product Management, Precisely
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
We have explained how best to use JIRA (JIRA guide) and what should be taken care of in the “Planning and Initiation” phase of a project and “Execution” phase of the project with Scrum framework.
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
Realizing the value of social media analytics can bolster your business goals. This type of analysis has grown in recent years due to the large amount of available information and the speed at which it can be collected and analyzed. In this workshop, we build a serverless data processing and machine learning (ML) pipeline that provides a multi-lingual social media dashboard of tweets within Amazon QuickSight. We leverage API-driven ML services, AWS Glue, Amazon Athena and Amazon QuickSight. These building blocks are put together with very little code by leveraging serverless offerings within AWS.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
Nguyen Vu Hung - Software Project Management with Jira AgileVu Hung Nguyen
Biography:
Nguyen Vu Hung is the CLO of Septeni Technology, a development center of Tokyo based Septeni Group that focuses on developing and operating, mostly, web-based online advertisement systems. He has numerous years of IT and software development, project/product management in both Japan and Vietnam. Considering himself as a FOSS and Agile evangelist and being a Agile lover and an CLO, he is also interested in not-so-related domains such as human resource management and (organization) (re)structuring. Hung is interested in: – Agile/Scrum and the alikes – Open Source – Project Management
Software project management with Jira Agile:
In this workshop, I will share hand-on experience on how using Jira Agile to manage project in Agile/Scrum ways. The workshop will guide you:
– How to create and manage your product backlog, sprints backlog using Confluence
– How to manage sprint backlog using Confluence, link it with JIRA
– How to manage daily tasks and stories in JIRA
– Using Scrum board, Epic
– Make Sprint report, Velocity chart
– Using Planning and Estimating
Goal of this session:
Master Scrum Artifacts using JIRA
References:
http://agiletourvietnam.org/speakers/
http://agiletourvietnam.org/speaker/nguyen-vu-hung/
http://agiletourvietnam.org/session/software-project-management-with-jira-agile/
Creating a Data validation and Testing StrategyRTTS
Creating A Data Validation & Testing Strategy
Are you struggling with formulating a strategy for how to validate the massive amount of data continuously entering your data warehouse or data lake?
We can help you!
Learn how RTTS’ Data Validation Assessment provides:
- an evaluation of your current data validation process
- recommendations on how to improve your process and
- a proposal for successful implementation
This slide deck addresses the following issues:
- How do I find out if I have bad data?
- How do I ensure I am testing the proper data permutations?
- How much of my data needs to be validated and automated?
- Which critical data endpoints need to be tested?
- How do I test data in my cloud environments?
And much more!
For more information, visit:
https://www.rttsweb.com/services/solutions/data-validation-assessment
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
Delta Lake is an open source framework living on top of parquet in your data lake to provide Reliability and performances. It has been open-sourced by Databricks this year and is gaining traction to become the defacto delta lake format.
We’ll see all the goods Delta Lake can do to your data with ACID transactions, DDL operations, Schema enforcement, batch and stream support etc !
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
Data mesh was among the most discussed and controversial enterprise data management topics of 2021. One of the reasons people struggle with data mesh concepts is we still have a lot of open questions that we are not thinking about:
Are you thinking beyond analytics? Are you thinking about all possible stakeholders? Are you thinking about how to be agile? Are you thinking about standardization and policies? Are you thinking about organizational structures and roles?
Join data.world VP of Product Tim Gasper and Principal Scientist Juan Sequeda for an honest, no-bs discussion about data mesh and its role in data governance.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
DAMA Australia: How to Choose a Data Management ToolPrecisely
The explosion of data types, sources, and use cases makes it difficult to make the right decisions around the best data management tools for your organisation. Why do you need them? Who is going to use them? What is their value?
Watch this webinar on-demand to learn how to demystify the decision making process for the selection of Data Management Tools that support:
· Data governance
· Data quality
· Data modelling
· Master data management
· Database development
· And more
A business-friendly approach to data governance is imperative to engage all users and accommodate diverse business use cases spanning analytics, operational improvements, and compliance requirements. To increase adoption and collaboration, business and technical data users across your organisation need to have a common, agreed-upon, and documented understanding of which data is most important, what it’s called, and where it’s used.
Watch this on-demand webinar, where we explore the concept of business-first Data Governance, an approach that promotes adoption by the organisation, lays the foundation for data integrity and consistently delivers business value in the long term.
We also look at how Oripharm, one of the dynamic healthcare players in the Nordics and international markets, choose a data governance solution:
• to improve personalisation of products and services
• to achieve accurate and timely credit-risk analysis
• to increase user productivity by improving time-to-insights
• to mitigate risk and facilitate regulatory compliance and reporting
Speakers:
Mikkel Holmgaard - Data Governance Lead, Orifarm
Emily Washington - Sr. Vice President, Product Management, Precisely
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
We have explained how best to use JIRA (JIRA guide) and what should be taken care of in the “Planning and Initiation” phase of a project and “Execution” phase of the project with Scrum framework.
Which Change Data Capture Strategy is Right for You?Precisely
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
• Advantages and disadvantages of different CDC methods
• The replication latency your project requires
• How to keep data current in Big Data technologies like Hadoop
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
Realizing the value of social media analytics can bolster your business goals. This type of analysis has grown in recent years due to the large amount of available information and the speed at which it can be collected and analyzed. In this workshop, we build a serverless data processing and machine learning (ML) pipeline that provides a multi-lingual social media dashboard of tweets within Amazon QuickSight. We leverage API-driven ML services, AWS Glue, Amazon Athena and Amazon QuickSight. These building blocks are put together with very little code by leveraging serverless offerings within AWS.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
Description of four techniques for Data Cleaning:
1.DWCLEANER Framework
2.Data Mining Techniques include Association Rule and Functional Dependecies
,...
Best practice strategies to clean up and maintain your database with Hether G...Blackbaud Pacific
In this webinar Hether Ghelf, Blackbaud Pacific’s Senior Consultant & Project Manager, discusses a best practice approach to database cleaning and continued maintenance.
Cleansing your data can have an immediate impact on your business by increasing retention and response rates, decreasing the volume of mail returned from post, and ensuring mail is reaching your organisation’s constituents.
View the recording here: https://www.blackbaud.com.au/notforprofit-events/webinars/past
Admin Tips, Tricks & Strategies for Data Quality in Salesforce - Francis Pind...Salesforce Admins
Arm yourself with both a strategy & Salesforce Platform tips and tricks to show to your organization the importance of avoiding bad data quality. Learn Platform features and handy tricks that will equip your org to enforce data quality. This presentation provides data quality strategy as well as implementation guidelines.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
This presentation is a summary of section 2 (of 6) of the book "The 360º Leader" by best-selling author John C Maxwell. Challenges and solutions include:
* Tension (the pressure of being caught in the middle),
* Frustration (following an ineffective leader),
* Multi-Hat (one person – demands and expectations from all quarters),
* Ego (being hidden in the middle),
* Fulfillment (stuck in the middle, when would rather be in front),
* Vision (how to champion it when you did not create it),
* Influence (influencing others whom you do not manage).
Jean-René Roy: Integrate Legacy App with Dynamic CRMMSDEVMTL
24 novembre 2014
Groupe SQL
Sujet: Integrate Legacy App with Dynamic CRM
Conférencier: Jean-René Roy
Dynamic CRM is more and more popular in enterprises. Some people say, ‘’It will be the next SharePoint cow for MS’’. But how do you integrate external legacy application in CRM and how to you transfer your legacy database in the CRM Database. This session introduce CRM concept and framework. Show how you can use SSIS to write and read data in CRM database and how you can integrate legacy application with a CRM solution.
RWDG Webinar: Achieving Data Quality Through Data GovernanceDATAVERSITY
Data quality requires sustained discipline around the management of data definition and production. Data Governance is a large part of that discipline. The relationship between how well data is governed and the quality of the data is obvious. You cannot have high quality data without active Data Governance.
This month’s Real-World Data Governance webinar with Bob Seiner addresses how to improve data quality through the application of Data Governance practices. Quality starts with a plan and requires formal execution and enforcement of authority over the data. Attend this webinar and take away a plan to achieve data quality through Data Governance.
In this webinar, Bob will discuss:
• How Data Governance leads to data quality
• Core principles of Data Governance and data quality success
• Quality metrics based on governance practices
• Relationship between quality and governance roles
• Steps to achieve quality through governance
This presentation covers material from John Maxwell's book, "The 360 Degree Leader." Specifically, the first of six sections is presented, including "The 7 Myths of Leading from the Middle of an Organization" and "5 Levels of Leadership Development."
Have you ever been involved in developing a strategy for loading, extracting, and managing large amounts of data in salesforce.com? Join us to learn multiple solutions you can put in place to help alleviate large data volume concerns. Our architects will walk you through scenarios, solutions, and patterns you can implement to address large data volume issues.
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
An overview of Talend Open Studio for Data Integration, along with some tips learned from building production jobs and a list of resources. Feel free to contact me for more information.
As part of your fundraising campaigns and online engagement, you likely collect many metrics and data points. But do you take the time to reflect on this data and use it to improve for next time? In this session, we’ll discuss metrics you can collect, share each other’s best practices for data collection processes, and demo dashboard tools that will help you see the big picture.
Jen Vaughan will walk you through readying yourself to apply for jobs using Tableau. From what to look for in a candidate, resume and how to gain a competitive edge.
NTEN Webinar - Data Cleaning and Visualization Tools for NonprofitsAzavea
Slides from a webinar we conducted for NTEN that covers tools that nonprofits can use to clean and prepare their datasets and then visualize them via charts, maps, and graphs.
Finding The Perfect Donor Database In An Imperfect World4Good.org
There are hundreds of donor databases on the market. Each has its own strengths and weaknesses, fans and foes. The challenge is to find a system with strengths that meet your needs, weaknesses that won’t get in your way, at a price you can afford.
This workshop will cover the basic concepts you will need to evaluate your options and make an informed decision.
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
With recent studies indicating that 80% of AI and machine learning projects are failing due to data quality related issues, it’s critical to think holistically about this fact. This is not a simple topic – issues in data quality can occur throughout from starting the project through to model implementation and usage.
View this webinar on-demand, where we start with four foundational data steps to get our AI and ML projects grounded and underway, specifically:
• Framing the business problem
• Identifying the “right” data to collect and work with
• Establishing baselines of data quality through data profiling and business rules
• Assessing fitness for purpose for training and evaluating the subsequent models and algorithms
Agile Data Science is a lean methodology that is adopted from Agile Software Development. At the core it centers around people, interactions, and building minimally viable products to ship fast and often to solicit customer feedback. In this presentation, I describe how this work was done in the past with examples. Get started today with our help by visiting http://www.alpinenow.com
Doing Analytics Right - Building the Analytics EnvironmentTasktop
Implementing analytics for development processes is challenging. As in discussed in the previous webinars, the right analytics are determined by the goals of the organization, not by the available data. So implementing your analytics solutions will require an efficient analytics and data architecture, including the ability to combine and stage data from heterogeneous sources. An architecture that excludes the ability to gain access to the necessary data will create a barrier to deploying your newly designed analytics program, and will force you back into the “light is brighter here” anti-pattern.
This webinar will describe the technical considerations of implementing the data architecture for your analytics program, and explain how Tasktop can help.
From Labelling Open data images to building a private recommender systemPierre Gutierrez
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hostels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, we will describe how we improved an e-business vacation retailer recommender system using the content of images. We’ll explain how to leverage open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state of the art machine learning methods without having deep learning expertise.
Data Profiling: The First Step to Big Data QualityPrecisely
Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions.
The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there.
View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.
How to Use Data to Build Better Products by fmr NY Times PMProduct School
Main takeaways:
- Why it matters what you measure- How data can tell you what users want, and what they don't want
- How to get familiar enough with your own data to be able to get what you want
- GA, SQL, etc.
- Why your goal should be to find the point in the data- What "actionable data" can look like
Exploring Data Preparation and Visualization Tools for Urban ForestryAzavea
This webinar was held on December 12, 2012 and provided an overview of free and low-cost tools for cleaning and preparing data and building useful and beautiful data visualizations.
Webinar - Harness the Power of Data with Tableau - 2016-02-18TechSoup
Learn how to harness the power of data to tell your organization’s story with Tableau! Join Tech Impact's Jordan McCarthy and learn how to use Tableau to collect data in more meaningful ways and understand the science behind data analysis. We show you easy tips to maneuver through this data analytics tool to gain a better understanding of your nonprofit or library’s data.
Similar to 7 Step Data Cleanse: Salesforce Hygiene (20)
2. Your lovely presenters
Photo Photo
here here
Ehren Foss Marc Baizman
@ehrenfoss @mbaizman
Salesforce.com data Nonprofit technology
wrangler, developer, gamer, coach, Salesforce and
hotkey afficianado, outdoor Google Apps, improv and
enthusiast sketch comedy
(Lefty)
performer, unrepentant nerd.
2
3. Wait…what kind of cleanse?
http://commons.wikimedia.org/wiki/File:Master_Cleanse_refrigerator.jpg
4. Stop me if you’ve heard this one…
“I’m not sure this report is showing us
the right information.”
“After this campaign, we’ll update our
donor data.”
“We need to import ALL of our
historical data.”
“I wish people would enter the right
information into the system.”
“We can get an intern to clean this up.”
8. Dear <<FirstName>>,
We would like to thank you for your generous
gift of $NULL. This will help us give NaN
rescued cats to starving children.
Sincerely,
Error Division by Zero
9. Why?! Make the pain stop, please!
No
automated
prevention
systems No data
Little or poor
hygiene
training
policies
No Culture
of “Clean
Dirty Historical
data import
Data” data!
10. Time for your cleanse!
1. Strategy
2. Accountability
3. Data Quality Reports/Dashboards
4. Automation: Validation and Workflow
Rules
5. Help Your Users
6. Objects and Fields
7. Apps
12. Strategy
• How does data serve your mission?
• Does data jeopardize your mission?
– What should always/never happen?
• A good strategy means:
– A culture of good data
– Practices & process survive staff turnover
– Tools, objects, and fields change – but data
stays clean
13. Baby Steps
Today: Sit down with intern, explain data rules,
document rules for next time
This week: Create validation rule
This month: Review reports with your E.D.
This year: Decrease duplicates by 90%. Mailing
files should take no more than an hour to
prepare.
This decade: Make sure data is never a barrier to
growth. 50,000 more meals served, 10,000
duplicates removed
15. Accountability:
A Clean Data Culture
• Who is responsible for data?
– Board and Leadership
• Support the culture, drive data
priorities, give rewards & accolades
– IT / Salesforce staff
• Integrations, data sources. Training and
“data ambassadors.”
– Directors
• Responsible for team’s data
– Staff
• Fundraisers, interns, accounting:
responsible for data they own or touch.
16. Accountability Tactics
• Appoint a “data czar” (coach)
• Public dashboards / reports
• Topic in regular staff meetings
• Leadership: “If it's not in Salesforce it
doesn't exist.”
• Section in annual report / board reports
• “Data Day” : all staff works on data cleanup
• Identify champions & coaches
17. Rewards & Punishments
• Wall of fame / wall of shame
• Data Rockstar / Data Dunce
• Competition and/or collaboration
• Show me the $$
– Amazon / Starbucks cards
– PTO…
– Bonuses
31. Validation RULES!
• Before a record is saved:
1. Check for bad things
2. Inform the user what’s wrong
• Automatically! Works for integrations too
Error: Invalid Data
Review all error messages below to correct your
data
You must solicit user feedback about your
validation rules
32.
33. Validation Rule ideas
• Dates
– End before Start
– Too far in the future / past
• Conditionally required fields
– Per record type
– Per status or picklist option
• At least…
– 2 letters in First Name, Last Name
– $5 for Donation value
35. Validation Rule tips…
• Combine with formula fields for more powerful
cross-object validation
• Check old data after you create a new rule!
– Bad data will remain unless record is edited
– Keep your report for this, re-use periodically to
double check
• Let your data guide you
– Don’t go rule crazy
• Don’t reinvent the wheel. Ideas:
– http://login.salesforce.com/help/doc/en/fields_
useful_field_validation_formulas.htm
36. Validation Goldilocks
Too many validation rules?
No data entered, user rebellion
Not enough required fields / validation rules?
Bad data
Just right
Less new bad data
Happy users deem you: “Data Hero”
44. Some Data Quality workflow ideas!
Update Opportunity Name to
“Account – Donation Type – Date”
Update a custom date field whenever
the Owner is Changed.
Create a task to review a Contact if it
hasn’t been modified in over a year.
Send an email if someone enters a
donation missing some key info.
https://help.salesforce.com/HTViewHelpDoc?id=workflow_examples.htm&language=en_US
54. Create Screencasts!
• Record once, use multiple times
– While it’s still fresh in your mind!
• Jing
– http://www.techsmith.com/jing.html
• Camtasia
– http://www.techsmith.com/camtasia.html
• Screenr
– http://www.screenr.com/
• ScreenSteps
– http://www.bluemangolearning.com/screensteps/
56. Data Model Changes
Simplify simplify simplify
• Delete records you don’t use
• Delete objects you don’t use
• Delete fields you don’t use
• Hide what you don’t use but can’t delete
Then, let Salesforce do the work for you
• Convert field types for maximum data
cleaning benefit
57. Objects
• Remove (seldom) used tabs for users
– And related lists
• Check relationships
– To and Fro
– 100%? 99%? 2% filled in?
– Filter by Record Type
• Records owned by inactive users?
• Records un-modified for 1+ years
• Records not related to anything
58. Fields
Remove unused / under used fields (< 5%
filled in)
Adding one? Take one away!
Use
Date, Date/Time, Email, Phone, Percent, an
d URL
59. Field Type Changes
• Textareas? Only if absolutely necessary
• Few unique values in a text field?
– Convert to picklists / checkboxes
• Multi-select Picklists
– Great for creating reporting headaches.
– Try checkboxes instead?
61. Apps! Diagnostics & Utilities
• CloudFixer
• Diagnostic report of common problems (and
their solutions) for
Salesforce, NPSP, Common Ground
• https://cloudfixer.co
• FieldTrip
• Standard and custom field usage
• https://appexchange.salesforce.com/listingD
etail?listingId=a0N30000003HSXEEA4
62. Apps! Diagnostics & Utilities
• Easy Describe
• View and extract object metadata
• http://www.etherios.com/products/easydescribe
• Grid Buddy
• https://appexchange.salesforce.com/listingDetail?listingId=a0N30
000003IkInEAK
• Data entry & editing across objects, en masse!
• Dupe Blocker
• http://www.crmfusion.com/dupeblocker
63. Apps! ETL / Heavy Lifting
• Demand Tools
– http://www.crmfusion.com/demandtools/
– Duplicate formulas and much much more…
• Apsona
– http://apsona.com/pages/sfdc/index.html
• Jitterbit
– http://www.jitterbit.com/salesforce/data-loader
• Apex Data Loader / LexiLoader
– Setup -> Admin Setup -> Data Management -> Data
Loader
64. Business Intelligence
Fancy toys to play with when your data is all
clean!
• Birst
– http://www.birst.com/
• Good Data
– http://www.gooddata.com/
• Crystal Reports
65. How did that feel?
1. Strategy
2. Accountability
3. Data Quality Reports/Dashboards
4. Automation: Validation and Workflow
Rules
5. Helping Our Users
6. Data Model
7. Apps for cleaning
66. Contact us! We can help
• Ehren Foss / CloudFixer
– https://cloudfixer.co
– ehren@cloudfixer.co
– @ehrenfoss
• Marc Baizman / MCG Training
– http://mcgtraining.com
– marc@mcgtraining.com
– @mbaizman
67. We Love Feedback
• How was the webinar?
• Which area do you think is most important
for you?
• What is clean data worth to you?
Editor's Notes
Hi folks! Welcome to The Seven Step Data Cleanse. We’re going to get started! Anyone who joins late will be forever wondering what they missed…
I’m Ehren Foss, of CloudFixer. I’ve worked with nonprofits and Salesforce.com for around six years, and I’ve seen some data so scary my hair fell out. I’m a developer and coder, I really enjoy using hotkeys in Gmail and other programs, and I like the outdoors when the Northeast isn’t blanketed in 14 inches of gray slush.Hi, I’m Marc Baizman….
So what kind of cleanse is this going to be, Marc? Do I need to drink a cup of lemon juice and snort cayenne powder? …Oh good, what a relief. Well, actually, cleaning data can be just as bad, can’t it?….
Stop us if you’ve heard these before…Uh oh, you built the database and you’re not sure? What happens if all the reports are wrong?Right, after this campaign. After this webinar I’m finally going to get in shape, organize my music, and learn Chinese.All of your historical data? Like 10 year old volunteer signups? I wish I had a pony and eternal youth!Yeah, because this intern will be way better than the last intern who made this mess in the first place.Our goal with this webinar is first to remind you that data hygiene is really, really important. Really really important. Like brushing your teeth, eating right, doing your taxes properly, the repercussions of not keeping your data clean can be pretty nasty. Our next goal is to show you that it’s not as hard as you thought to keep your data clean. Then we’re going to show you some really handy, specific strategies and tactics you can use.
Uh oh, if I came across that note in an Excel file I’d start to worry!Don’t worry, I’m exaggerating. Just a little.Why do people need to clean spreadsheets? Because the data that comes out of Salesforce is messy, and before it gets sent somewhere else – to a bookkeeper, or bulk mailing system – it has to be cleaned up.
Whoah, what if after you spend a whole day cleaning up the spreadsheet, the number STILL doesn’t match finance? Well my friend, then it’s time to dust off the resume and fire up your job search. Time to abandon ship!You’re exaggerating again aren’t you…any data can be cleaned. You might be left with a smaller database, but the world will be a better place.
Marc I think the organization that sent this email should attend this webinar. Looks like their mail merge didn’t go so smoothly!
SHOW POLL RESULTS Ok, we’ve pounded into you the importance of keeping your data clean, and shown how much damage bad data can do. For the rest of the webinar we’re going to…7 issuesTalk about themSpecific, take home tacticsWill share slidesAsk questions anytimeWe’ll send slides, recording, and Q&A we didn’t get to out to all participants.
Topic number one! Strategy. There’s a reason we’re starting here, because everything else flows from it.
It’s silly to have a slide about “strategy tactics”, but because your strategy has a goal in mind, you’ll need to break down that goal into tasks and prioritize them. These are just a couple examples of what you can weave into your staff meetings, priority lists, and the like. You can also give your organization an assignment to publish a data strategy in one month, and to follow up on it every three months.A high level goal about data should exist on the same level as goals involving fundraising, your program, or your mission. Who cares if you served 10% more meals but had to work twice as hard to do it because of bad data?
Hand in hand with strategy is who will be implementing it. Who is accountable for executing the strategy?
We hope you’ll appreciate that we won’t be doing the next section, automation, in a robot voice. Why not use some amazing Salesforce features that automatically help keep data clean?
Here’s a screenshot of what it looks like to edit a Validation Rule. On the left the Green arrow is pointing to the Validation Rules area for the Contacts object. For Custom Objects, they’ll be on the Object configuration page under the Create menu. The Red Arrow is the Error message. Be sure to make this informative and helpful! Imagine explaining it to someone on their first day using Salesforce. The Description in Yellow can help you and other Administrators remember what the intent of the rule was, and to help document the formula.Standard ObjectsSetup (top right, under your name) -> App Setup -> Customize -> (Object) -> Validation RulesCustom ObjectsSetup -> App Setup -> Create -> Objects -> Object-> Validation RulesNext to field vs. top of page
If you aren’t sure where to start, do a quick web search for validation rules examples and ideas. There are tons out there in the community, and some of them are exceedingly clever!The most common types are to make sure dates make sense. Should I set a reminder in the past? Can an event end before it starts? Was someone born 5 years from now? These are philosophical questions, but the answer is probably no.If you use Record Types, or have two different kinds of behavior or processes in a single object (like Organization or Individual Accounts), you can use validation rules to enforce data in each case. Same thing for picklists. If the Status is Open, Closed Date should be blank.Do you have “placeholder” Leads and other stuff that are basically empty? Make sure people don’t just type in garbage like a single question mark.
This is a real world example of an organization that tracks internships. You can see way out in the year 2069 and 2093, there are a couple records with really, really strange dates. They could use a validation rule to prevent those records from being saved.
Let’s say you’re validating a Contact, and have a rule that should only flag records if the Contacts’ Account has a certain Type. Validation Rules already allow you to do that for certain types of relationships, which is really cool.You can use Roll-Up summary formula custom fields, or other clever formula fields, to make it easier to validate data. How about you can’t Close an Opportunity that has pending transactions? Or you can’t cancel an event with registrants?This next one is extremely easy to forget. When you create a validation rule, Salesforce doesn’t magically go and fix your old bad data. Create a report that finds records that violate the rule, and fix those records. Keep that report around so you can use it later to double check your validation rule. Sometimes for imports, people turn validation rules off and then forget to turn them back on. A backup report is a great way to find out when that happens.Let your data be your guide. Don’t make 50 validation rules just because you think something might be a problem. Look for problems and let them guide where you apply validation rules.
We don’t have the time to show you each step in detail, but this is a quick overview.A rule has an Action – updating a field, sending an email, stuff like that.A rule has criteria – when should this rule happen? What records should be affected? When should the rule fire? You put those two things together and create the rule.Something that is extremely easy to forget? Activate the rule. Check a box, it’s that easy.But you’re not done! You need to inform your users about the rule, what it does, why it exists, and who to talk to with problems. You should document the thinking behind the rule, and the process it helps drive, so you can remember what it does in a year. You should test the rule. Make sure it operates on the proper records, and does the right thing!
Record must be saved first before workflow can execute!
As you drill down, start with objects. New Salesforce users are often stumped by the tabs at the top, because they don’t yet know what everything does and are afraid to break things. Soothe them by removing tabs or apps they don’t need. When the time comes, you can add them back.Next, take a gander at your Lookup fields. You can use apps like Field Trip or CloudFixer (or just run reports) to see how “filled in” those relationships are. Should objects always be related? Should a certain Opportunity record type always have a Contact associated? Other relatively simple things to check for are records owned by inactive users – they will be harder to find – or records that haven’t been modified in a while, especially Accounts and Contacts. Same with records not related to anything – does the data in that record stand alone? Or should you give it context?
A typicalsalesforce instance has between 100 and 200 objects and around 20 fields on each objects. That’s a lot! But don’t worry, you can focus your review on the objects people spend most of their time with.Again, use reports or an app to identify unused or under utilized fields. Remove them from layouts, or delete them entirely.Another handy rule is to always remove a field from the system when you need to add one. When creating new custom fields, be sure to use special field types for special data. Salesforce will help you validate dates, emails, phone numbers, percents, numbers, and URLs.
Easy describe helps dig into your data model and other configuration settings – this is for when you need to look at what system administrators in the past may have set up.Grid buddy allows you to do bulk updates to records on different objects in a spreadsheet type interface, all within SalesforceDupe Blocker does exactly that – blocks common types of duplicates. It’s by the same folks who make Demand Tools
If you’re so inclined, these are apps we’d recommend when you really have to get your hands dirty and move some data around. ETL stands for Extract, Transform, Load, so you can use that to help your web searching.Demand Tools is the cat’s pajamas. They’ve recently ended free support for nonprofits, but this app is still deeply discounted and extremely valuable. It’s a challenge to learn but valuable expertise. Apsona falls into the same category – comes up quite a bit in the NPSP and NPSF google groups and can provide more powerful reporting.Jitterbit’s data loader is just as powerful as the regular Data Loader, but it’s much easier to use and more configurable.
Once you’re done creating a strategy, getting your leadership on board, doing daily, weekly, monthly, quarterly, and annual things to keep your data clean, and your data is pretty clean, now you can use business intelligence tools with confidence. The amazing reports that can come out of these tools will help you make the best decisions you can for your organization and your mission, and that’s what it’s all about.
7 issuesTalk about themSpecific, take home tacticsWill share slides