As security and complaince becomes more important for organizations, especially in the age of GDPR, data breach and other legislation, Karen covers the types of features data architects and designers should be considering when building modern, protected and defensive systems.
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?Skyl.ai
About the webinar
According to a report “The Digital Universe Driving Data Growth in Healthcare,” published by EMC with research and analysis from IDC, Hospitals are producing 50 petabytes of data per year. Almost 90% of this data is comprised of medical imaging i.e. digital images from scans like MRIs or CTs. More than 97% of this data goes unanalyzed or unused.
The top healthcare institutions across the globe are adopting AI in medical imaging to increase speed and imaging accuracy, monitor data in real-time, and eliminate the need for humans to do time-consuming and complex tasks. This has been enabling doctors to optimize treatment approaches, speed of care, and interconnected health conditions.
Through this webinar, you will understand how AI can be used to automate routine processes and procedures and help radiologists to identify patterns, and help in treating patients with critical conditions quickly.
What you will learn:
- How healthcare institutions are leveraging AI to augment decision making, prevent medical errors, and reduce costs in medical imaging
- Discuss the approach to automate machine learning workflow, creating and deploying models in hours, not weeks or months
- Demo: How to detect pneumonia from chest x-rays using AI within a few minutes using skyl.ai
Twitter Sentiment Analysis in 10 Minutes using Machine LearningSkyl.ai
About the webinar:
Social media is one of the richest sources of data for brands. According to Domo's 'Data never sleeps' report, every single minute 456,000 tweets are posted on Twitter, 46,740 photos are uploaded on Instagram and 510,000 comments & 293,000 statuses are updated on Facebook.
This data contains valuable information like product feedback or reviews and information that can be used to better understand users or find valuable insights. However, traditional ways struggle to analyze the unstructured data and this is where sentiment analysis using machine learning comes to the rescue!, Machine learning can help to understand the text and extract the sentiment using Natural Language Processing. Sentiment analysis can be applied in a range of business applications like - social media channel analysis, 360-degree customer insights, user reviews, competitive analysis, and many more.
What you will learn
- How businesses are leveraging sentiment analysis to their advantage
- Best practice to automate machine learning models in hours not months
- Demo: How to build a twitter sentiment analysis model
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...Big Data Week
Data Science is now well established in our businesses, and everyone considers data as a key asset and critical for our competitiveness.
However, Data Science is not easy to manage, very often projects failed and the investment made is not seeing as profitable.
The aim of this talk is to share the knowledge in different areas:
* avoid classical mistakes in Data Science
* use the right Big Data technology
* apply the right methodology
* make the Data Science team more efficient
AI in Healthcare: How to Implement Medical Imaging Using Machine Learning?Skyl.ai
About the webinar
According to a report “The Digital Universe Driving Data Growth in Healthcare,” published by EMC with research and analysis from IDC, Hospitals are producing 50 petabytes of data per year. Almost 90% of this data is comprised of medical imaging i.e. digital images from scans like MRIs or CTs. More than 97% of this data goes unanalyzed or unused.
The top healthcare institutions across the globe are adopting AI in medical imaging to increase speed and imaging accuracy, monitor data in real-time, and eliminate the need for humans to do time-consuming and complex tasks. This has been enabling doctors to optimize treatment approaches, speed of care, and interconnected health conditions.
Through this webinar, you will understand how AI can be used to automate routine processes and procedures and help radiologists to identify patterns, and help in treating patients with critical conditions quickly.
What you will learn:
- How healthcare institutions are leveraging AI to augment decision making, prevent medical errors, and reduce costs in medical imaging
- Discuss the approach to automate machine learning workflow, creating and deploying models in hours, not weeks or months
- Demo: How to detect pneumonia from chest x-rays using AI within a few minutes using skyl.ai
Twitter Sentiment Analysis in 10 Minutes using Machine LearningSkyl.ai
About the webinar:
Social media is one of the richest sources of data for brands. According to Domo's 'Data never sleeps' report, every single minute 456,000 tweets are posted on Twitter, 46,740 photos are uploaded on Instagram and 510,000 comments & 293,000 statuses are updated on Facebook.
This data contains valuable information like product feedback or reviews and information that can be used to better understand users or find valuable insights. However, traditional ways struggle to analyze the unstructured data and this is where sentiment analysis using machine learning comes to the rescue!, Machine learning can help to understand the text and extract the sentiment using Natural Language Processing. Sentiment analysis can be applied in a range of business applications like - social media channel analysis, 360-degree customer insights, user reviews, competitive analysis, and many more.
What you will learn
- How businesses are leveraging sentiment analysis to their advantage
- Best practice to automate machine learning models in hours not months
- Demo: How to build a twitter sentiment analysis model
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...Big Data Week
Data Science is now well established in our businesses, and everyone considers data as a key asset and critical for our competitiveness.
However, Data Science is not easy to manage, very often projects failed and the investment made is not seeing as profitable.
The aim of this talk is to share the knowledge in different areas:
* avoid classical mistakes in Data Science
* use the right Big Data technology
* apply the right methodology
* make the Data Science team more efficient
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
Banks are rich in valuable data and can build and maintain a competitive advantage by identifying and executing on high-value machine learning projects leveraging the rich data available.This webinar will describe use cases fit for big data and machine learning in the banking sector (commercial, consumer, regulatory, and markets) and the impact they can have for your organization.
3 things to learn:
* How to create a next generation data platform and why it is important
* How to monetize big data using predictive modeling and machine learning
* What is needed for automated machine learning as a sustainable, cost-effective, and efficient solution
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Jon Mead
'Machine learning’ is one of those cringy phrases, almost (if not already) taboo in the world of high-tech SaaS. Applying true machine learning to an organization’s product(s), however, can have real benefit for the business, its clients, and the industry as a whole. From credit card fraud investigations to the way that a car is built, machine learning has permeated our everyday life without a common understanding of what it is and how to implement it.
Data Modeling for Security, Privacy and Data ProtectionKaren Lopez
Karen Lopez (@datchick/InfoAdvisors) 90-minute presentation on Data Security, Data Privacy, Compliance and how data modelers should discover, assess, and monitor these important data management responsibilities.
Data Science Transforming Security OperationsPriyanka Aash
Data science brings a huge promise to IT security and accordingly to the sprouting of DS teams across all enterprises, and numerous vendors. Indeed DS has the potential to transform the way security is done—yet, the secret sauce is how to do it in a way that actually provides clear value, embedded into the security workflow, and leverages the human knowledge in combined with the data.
(Source: RSA USA 2016-San Francisco)
Securing SharePoint, OneDrive, & Teams with Sensitivity LabelsDrew Madelung
How do you protect your confidential content from being exposed? Being able to secure your files and content across workloads is a necessity and the tools are available to you today in the Microsoft 365 Security admin center. Microsoft 365 Sensitivity Labels are the evolution of Azure Information Protection and more within the Microsoft Information Protection suite.
Big Data, why the Big fuss.
Volume, Variety, Velocity ... we know the 3 V's of Big Data. But Big Data if it yields little Information is useless, so focus on the 4th V = Value.
If you haven't sorted quality & data governance for your "little data" then seriously consider if you want to venture into the world of Big Data
How to classify documents automatically using NLPSkyl.ai
About the webinar
Documents come in different shapes and sizes - From technical documents, customer support chat, emails, reviews to news articles - all of them contain information that is valuable to the business.
Managing these large volume data documents in a traditional manual way has been a complex and time-consuming task that requires enormous human efforts.
In this webinar, we will discuss how Machine learning can be used to identify and automatically label news articles into categories like business, politics, music, etc. This can be applied in another context like categorizing emails, reviews, and processing text documents, etc.
What you will learn
- How businesses are leveraging document classification to their advantage
- Best practice to automate machine learning models in hours not months
- Demo: Classify news articles into the right category using convolution neural network
Why do most machine learning projects never make it to productionCameron Vetter
Machine Learning is quickly becoming a ubiquitous technology and expected skill of development teams. In 2019 a stunning 87% of Data Science projects never made it into production. What will you do to make sure that your ML and Data Science projects succeed?
This talk will focused on digging deeper into that static and help to explain what goes wrong, why it goes wrong, and what I do to mitigate these issues.
You’ll leave with a deep understanding of ML project failures, and some advice on how to improve your projects chance of success. You also will learn how to fail quickly in ML model development and pivot towards a path of success.
This presentation was discussed in a Webinar with MetricStream in September 2016. It is applicable for small, medium and large businesses when considering information and cyber security risk.
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Avinash Ramineni
The evolution of the general technology landscape continues to influence the security and risk landscape across multiple enterprises. IoT, cloud, edge computing, containerization, automation, big data, AI--all influence a breadth of security challenges and opportunities for organizations. Security teams must address the emerging technology risk stack while ensuring these solutions work as intended to shape the intelligence, velocity and scale at which security-related decision points can be reached.
This talk will give a sneak peek into technologies that will influence the security capabilities of the future and reshape the way organizations address challenges with current and emerging technologies. We will perform a live demo illustrating the concept of autonomous security and how it will reshape the current security landscape as we know it.
Learning Objectives:
Understand practical AI applications in building autonomous security systems.
Be introduced to the concept of autonomous security and how it can transform the way organizations currently address security challenges.
Identify the growth of SOAR (security orchestration, automation and response) and how to fine-tune it with a data and intelligence driven approach.
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...Jai Natarajan
We describe why and how to be mindful about designing you data annotation pipeline to be scalable and to delivery consistent high quality results regardless of domain
How to perform Secure Data Labeling for Machine LearningSkyl.ai
Data annotations or more commonly called data labeling are an integral part of AI and Machine Learning.
One of the biggest concerns that organizations have while doing AI and ML is handling data.
Many organizations have concerns about data security and privacy of the training data, especially highly regulated industries like Healthcare, Banking, Government, etc. where data privacy and security are paramount.
What you will learn:
- Risks associated with data annotations and how to manage data privacy and data protection
- How to handle deployments and infrastructure to manage data security
- How to manage collaborative contributors for secure data labeling to balance scale, security, cost, and quality in data labeling
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
On November 6th, we got together at Google Campus to talk about Mesos and DC/OS.
Ignacio Mulas, Sparta & Spark Product Owner at Stratio, explained how to build an environment that can secure and govern its data for operational and analytical applications on top of DC/OS platform. He showed that analytical and machine learning pipelines can be combined with operational processes maintaining the security and providing governing tools to manage our data. He focused on the architecture and tools needed to achieve an ecosystem like this and we will show a demo of it. He also explained how we can develop our pipelines interactively with auto-discovered data catalogs and explore our results.
Find out more: https://www.stratio.com/events/discover-how-to-deploy-a-secure-big-data-pipeline-with-dcos/
ALTITUDE 2019 | Enabling Productivity with Agile SecurityBetterCloud
It’s a fine line to give users control over settings and their data, but not too much control. In this session, you’ll learn how Chad Ponder, VP Technology Service and Support at United Capital, Mark Bowling, Remote Chief Information Security Officer at United Capital, and Colin McCarthy, VP of Global IT at Essence, are creating workflows that give their users the flexibility they need to do their jobs, without compromising security.
Slide deck for the DGIQ SIG on AI Ethics.
Are you concerned about data and AI ethics? Do you worry about how to make sure the algorithms and systems that affect our lives are fair, honest, responsible, and respectful of our rights and values? Do you have opinions about how to build an organizational culture that cares about these topics
Join us for what will surely be a lively and interesting session where you are the speakers.
Special interest group (SIG) discussions are group conversations on topics that are new, or specific to an audience segment. The format is casual and without any formal presentation. The objective is to engage all participants in an exchange of ideas, questions, and advice, so please come with a willingness to participate in the conversation.
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...Karen Lopez
SQL Server includes multiple features that focus on data security, privacy, and developer productivity. In this session, we will review the best features from a database designer’s and developer’s point of view.
– Always Encrypted
– Dynamic Data Masking
– Row Level Security
– Data Classification
– Assessments
– Defender for SQL Server
– Ledger Tables
…and more
We’ll look at new and older features, why you should consider them, where they work, where they don’t, who needs to be involved in using them, and what changes, if any, need to be made to applications or tools that you use with SQL Server.
You will learn:
– The pros and cons of implementing each feature
– How implementing these new features may impact existing applications
– 10 tips for enhancing SQL Server security and privacy protections
More Related Content
Similar to Designing for Data Security by Karen Lopez
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
Banks are rich in valuable data and can build and maintain a competitive advantage by identifying and executing on high-value machine learning projects leveraging the rich data available.This webinar will describe use cases fit for big data and machine learning in the banking sector (commercial, consumer, regulatory, and markets) and the impact they can have for your organization.
3 things to learn:
* How to create a next generation data platform and why it is important
* How to monetize big data using predictive modeling and machine learning
* What is needed for automated machine learning as a sustainable, cost-effective, and efficient solution
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Jon Mead
'Machine learning’ is one of those cringy phrases, almost (if not already) taboo in the world of high-tech SaaS. Applying true machine learning to an organization’s product(s), however, can have real benefit for the business, its clients, and the industry as a whole. From credit card fraud investigations to the way that a car is built, machine learning has permeated our everyday life without a common understanding of what it is and how to implement it.
Data Modeling for Security, Privacy and Data ProtectionKaren Lopez
Karen Lopez (@datchick/InfoAdvisors) 90-minute presentation on Data Security, Data Privacy, Compliance and how data modelers should discover, assess, and monitor these important data management responsibilities.
Data Science Transforming Security OperationsPriyanka Aash
Data science brings a huge promise to IT security and accordingly to the sprouting of DS teams across all enterprises, and numerous vendors. Indeed DS has the potential to transform the way security is done—yet, the secret sauce is how to do it in a way that actually provides clear value, embedded into the security workflow, and leverages the human knowledge in combined with the data.
(Source: RSA USA 2016-San Francisco)
Securing SharePoint, OneDrive, & Teams with Sensitivity LabelsDrew Madelung
How do you protect your confidential content from being exposed? Being able to secure your files and content across workloads is a necessity and the tools are available to you today in the Microsoft 365 Security admin center. Microsoft 365 Sensitivity Labels are the evolution of Azure Information Protection and more within the Microsoft Information Protection suite.
Big Data, why the Big fuss.
Volume, Variety, Velocity ... we know the 3 V's of Big Data. But Big Data if it yields little Information is useless, so focus on the 4th V = Value.
If you haven't sorted quality & data governance for your "little data" then seriously consider if you want to venture into the world of Big Data
How to classify documents automatically using NLPSkyl.ai
About the webinar
Documents come in different shapes and sizes - From technical documents, customer support chat, emails, reviews to news articles - all of them contain information that is valuable to the business.
Managing these large volume data documents in a traditional manual way has been a complex and time-consuming task that requires enormous human efforts.
In this webinar, we will discuss how Machine learning can be used to identify and automatically label news articles into categories like business, politics, music, etc. This can be applied in another context like categorizing emails, reviews, and processing text documents, etc.
What you will learn
- How businesses are leveraging document classification to their advantage
- Best practice to automate machine learning models in hours not months
- Demo: Classify news articles into the right category using convolution neural network
Why do most machine learning projects never make it to productionCameron Vetter
Machine Learning is quickly becoming a ubiquitous technology and expected skill of development teams. In 2019 a stunning 87% of Data Science projects never made it into production. What will you do to make sure that your ML and Data Science projects succeed?
This talk will focused on digging deeper into that static and help to explain what goes wrong, why it goes wrong, and what I do to mitigate these issues.
You’ll leave with a deep understanding of ML project failures, and some advice on how to improve your projects chance of success. You also will learn how to fail quickly in ML model development and pivot towards a path of success.
This presentation was discussed in a Webinar with MetricStream in September 2016. It is applicable for small, medium and large businesses when considering information and cyber security risk.
Autonomous Security: Using Big Data, Machine Learning and AI to Fix Today's S...Avinash Ramineni
The evolution of the general technology landscape continues to influence the security and risk landscape across multiple enterprises. IoT, cloud, edge computing, containerization, automation, big data, AI--all influence a breadth of security challenges and opportunities for organizations. Security teams must address the emerging technology risk stack while ensuring these solutions work as intended to shape the intelligence, velocity and scale at which security-related decision points can be reached.
This talk will give a sneak peek into technologies that will influence the security capabilities of the future and reshape the way organizations address challenges with current and emerging technologies. We will perform a live demo illustrating the concept of autonomous security and how it will reshape the current security landscape as we know it.
Learning Objectives:
Understand practical AI applications in building autonomous security systems.
Be introduced to the concept of autonomous security and how it can transform the way organizations currently address security challenges.
Identify the growth of SOAR (security orchestration, automation and response) and how to fine-tune it with a data and intelligence driven approach.
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...Jai Natarajan
We describe why and how to be mindful about designing you data annotation pipeline to be scalable and to delivery consistent high quality results regardless of domain
How to perform Secure Data Labeling for Machine LearningSkyl.ai
Data annotations or more commonly called data labeling are an integral part of AI and Machine Learning.
One of the biggest concerns that organizations have while doing AI and ML is handling data.
Many organizations have concerns about data security and privacy of the training data, especially highly regulated industries like Healthcare, Banking, Government, etc. where data privacy and security are paramount.
What you will learn:
- Risks associated with data annotations and how to manage data privacy and data protection
- How to handle deployments and infrastructure to manage data security
- How to manage collaborative contributors for secure data labeling to balance scale, security, cost, and quality in data labeling
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
On November 6th, we got together at Google Campus to talk about Mesos and DC/OS.
Ignacio Mulas, Sparta & Spark Product Owner at Stratio, explained how to build an environment that can secure and govern its data for operational and analytical applications on top of DC/OS platform. He showed that analytical and machine learning pipelines can be combined with operational processes maintaining the security and providing governing tools to manage our data. He focused on the architecture and tools needed to achieve an ecosystem like this and we will show a demo of it. He also explained how we can develop our pipelines interactively with auto-discovered data catalogs and explore our results.
Find out more: https://www.stratio.com/events/discover-how-to-deploy-a-secure-big-data-pipeline-with-dcos/
ALTITUDE 2019 | Enabling Productivity with Agile SecurityBetterCloud
It’s a fine line to give users control over settings and their data, but not too much control. In this session, you’ll learn how Chad Ponder, VP Technology Service and Support at United Capital, Mark Bowling, Remote Chief Information Security Officer at United Capital, and Colin McCarthy, VP of Global IT at Essence, are creating workflows that give their users the flexibility they need to do their jobs, without compromising security.
Similar to Designing for Data Security by Karen Lopez (20)
Slide deck for the DGIQ SIG on AI Ethics.
Are you concerned about data and AI ethics? Do you worry about how to make sure the algorithms and systems that affect our lives are fair, honest, responsible, and respectful of our rights and values? Do you have opinions about how to build an organizational culture that cares about these topics
Join us for what will surely be a lively and interesting session where you are the speakers.
Special interest group (SIG) discussions are group conversations on topics that are new, or specific to an audience segment. The format is casual and without any formal presentation. The objective is to engage all participants in an exchange of ideas, questions, and advice, so please come with a willingness to participate in the conversation.
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...Karen Lopez
SQL Server includes multiple features that focus on data security, privacy, and developer productivity. In this session, we will review the best features from a database designer’s and developer’s point of view.
– Always Encrypted
– Dynamic Data Masking
– Row Level Security
– Data Classification
– Assessments
– Defender for SQL Server
– Ledger Tables
…and more
We’ll look at new and older features, why you should consider them, where they work, where they don’t, who needs to be involved in using them, and what changes, if any, need to be made to applications or tools that you use with SQL Server.
You will learn:
– The pros and cons of implementing each feature
– How implementing these new features may impact existing applications
– 10 tips for enhancing SQL Server security and privacy protections
Designer's Favorite New Features in SQLServerKaren Lopez
A database designer's favourte features in SQL Server...with a bit of Azure SQL DB, too.
Always Encrypted
Row Level Security
Microsoft Purview
Azure Enabled SQL Server
Azure Defender for SQL
Azure Defender for Cloud
Dynamic Data Masking
Ledger Database and Tables
Data Privacy
Data Governance
Karen's Presentation to DAMA Chicago and other DAMA Chapters on 15 February 2023.
This presentation is less about data lakes that it is about Data Quality and how data professionals should think about designing and architecting systems that best meet the needs of how data works in the real world.
Expert Cloud Data Backup and Recovery Best Practice.pptxKaren Lopez
We’ve been deploying backup solutions since the beginning of computing and the foundations of backup and recovery have stayed the same: make sure backups run consistently and set recovery objectives. Yet systems in 2022 don’t work or act the same way they did decades ago. Cloud data backups have helped us meet the need for offsite backups, as well as impacted how we budget for them. Ransomware has impacted how we store them. The laws of physics might be more of an issue than when we had tapes stored in a safe down the hall. Cost models have changed, too.
In this session, Karen Lopez covers best practices for modern data recovery…and she will share stories of worst practices just to keep it real.
Manage Your Time So It Doesn't Manage YouKaren Lopez
NASA Space Apps NYC Pre-Hackathon Symposium presentation by Karen Lopez, InfoAdvisors and NASA Datanaut. Karen presents on how to successfully manage your time and deliverables in the NASA Space Apps Challenge no matter where you are participating.
This one-hour presentation covers the tools and techniques for migrating SQL Server databases and data to Azure SQL DB or SQL Server on VM. Includes SSMA, DMA, DMS, and more.
Blockchain for the DBA and Data ProfessionalKaren Lopez
An overview of blockchain fundamentals, including examples of Oracle 20c Blockchain Tables. Includes concepts of trust, immutability, hashes, distributed nodes, and cryptography.
Blockchain for the DBA and Data ProfessionalKaren Lopez
With all the hype around blockchain, why should a DBA or other data professional care? In this session, we will cover the basics of blockchain as it applies to data and database processes:
Immutability
Verification
Distribution
Cryptography
Transactions
Trust
We will look at current offerings for blockchain features in Azure and in database and data stores. Finally, we'll help you identify the types of business requirements that need blockchain technologies.
You will learn:
Understand the valid uses of blockchain approaches in databases
How current technologies support blockchain approaches
Understand the costs, benefits, and risks of blockchain
Data Security and Protection in DevOps Karen Lopez
Presentation to London #WinOps event Sept 2019. Focusing on data security, privacy, and protection on DevOps efforts. Includes data masking, dev and test, data, Alwasy Encrypted, and more.
There are many data modeling and database design terms and jargon that uses the word "key." Do you know the difference between a surrogate key and a primary key? A super key and a candidate key? Could you explain them to a technical audience? A business user or an auditor?
In this presentation, Karen Lopez covers the concepts of primary keys, foreign keys, candidate key, surrogate keys, and more.
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
Karen Lopez talks to data architects and data moders how they can best deliver value on modern data drive projects beyond relational database technologies. She covers NoSQL Databases and Datastores, which data stories they best fit and which ones they don't. She ends with 10 tips for adding more value to ployschematic database solutions.
Karen's Favourite Features of SQL Server 2016Karen Lopez
Slides from a one hour webinar on Karen Lopez's favorite features from database designer's point of view. Topics include Always Encrypted, Data Masking, Row Level Security, Foreign Keys, JSON and more.
Notice an error? Let me know. I welcome this sort of feedback.
In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.
From SQLBitsXV
Notice an error? Let me know. I welcome this sort of feedback.
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez
Karen Lopez's presentation about 10 Physical Data Modeling/Database Design blunders, based on her work in helping organizations get the most value out of their models and data.
Notice an error? Let me know. I welcome this sort of feedback.
NoSQL and Data Modeling for Data ModelersKaren Lopez
Karen Lopez's presentation for data modelers and data architects. Why data modeling is still relevant for big data and NoSQL projects.
Plus 10 tips for data modelers for working on NoSQL projects.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
8. Day one
Collaborative
Responsible
Compliant
Required
Governed
Security
& Privacy
by Design
9. Ready for 25 May?
How can we get started?
Can you help us get
certified?
Do you have software for
this?
Do you have a couple of
weeks to help us get this
done?
10. No Methodology
No Models
Misfocused Management
No Measurement
Too Much Madness
How Does
this
happen?
12. Security at the data level
Models capture security &
privacy requirements
Management reports of
reviews
Measurement
In other words, Governance
Methodology?
22. Why would a DB
Designer love it?
Always Encrypted, yup
Allows designers to not only
specify which columns need to
be protected, but how
Parameters are encrypted as
well
Built in to the engine, easier
for Devs
24. Privacy -Dynamic Data Masking
CREATE TABLE Membership(
MemberID int IDENTITY PRIMARY KEY,
FirstName varchar(100) MASKED WITH (FUNCTION =
'partial(1,"XXXXXXX",0)') NULL,
LastName varchar(100) NOT NULL,
Phone# varchar(12) MASKED WITH (FUNCTION = 'default()') NULL,
Email varchar(100) MASKED WITH (FUNCTION = 'email()') NULL);
INSERT Membership (FirstName, LastName, Phone#, Email) VALUES
('Roberto', 'Tamburello', '555.123.4567', 'RTamburello@contoso.com'),
('Janice', 'Galvin', '555.123.4568', 'JGalvin@contoso.com.co'),
('Zheng', 'Mu', '555.123.4569', 'ZMu@contoso.net');
25. Why would a Data
Designer love it?
Allows central, reusable
design for standard
masking
Offers more reliable
masking and more usable
masking
Removes whining about
“we can do that later”
27. Why would a Data
Designer love it?
Allows a designer to do
this sort of data
protection IN THE
DATABASE, not just
rely on code.
Many, many pieces of
code.
33. What should we STOP
doing?
Nobody ever talks
about this….
34.
35. SQL Injection
WE ARE STILL DOING THIS!
IT’S STILL THE #1 (but unsecured storage is
getting more popular)
TEST. TEST SOME MORE
Automated Testing
Governance is important
37. Test Data
Bad
Restoring Production to
Development
Restoring Production, with
Masking
Restoring Production, with
Randomizing
Restoring
Production…anywhere
Better
Design Test Data
Lorem Ipsum for Data
Really, Design Test Data
38. Only Generalists
No other profession uses this approach. The Body of
Knowledge and the Required skillsets in IT and IS is too
broad and changes to rapidly.
39. Trusting good people
Good people don’t always stay that way
People mess up
Monitoring
Checking
Automatic alerting
40.
41. What Skills Do Data
Professionals Need
for Data Protection?
No one ever talks
about this….
43. Data Protection and Security
Level: Active Skills
Security Requirements
Security Techniques
Where to apply them
Whose Job is it?
Security testing &
Validation
Security By Design
Data Governance
44. Big Data and
Analytics
Level: Literacy and Hands On
Why: These new technologies and
techniques are making it mainstream
in most shops, whether they are
installed or software as a service.
Plus, we need to use them on our
own data
Who: All IT roles, especially data
stewarding ones.
45. Literacy with Deep Learning, AI, Machine Learning
Level: Literacy +++
How are they used?
What are the real life uses today?
Future uses
Privacy and Security requirements
Compliance trade-offs
Employee Monitoring
46. Data Quality & Reliability
Level: Active Skills
Is the data right?
Is it current?
Should it be there at all?
Do we Know where it came from?
Do we know it was calculated correctly?
Are there any know anomalies?
47. How can we do all
this?
Cloud Services are a fantastic way
to learn and get hands on skills.
Online Tutorials are often free and
self guided
Learn from Experts & Case
Studies
Deprioritize tasks that are really
just being done for tradition
Hire help
Automate away some tasks to
make more time
48.
49. One more time…
Every Design
Decision must be
based on Cost,
Benefit and Risk
www.datamodel.com