Karen Lopez's presentation for data modelers and data architects. Why data modeling is still relevant for big data and NoSQL projects.
Plus 10 tips for data modelers for working on NoSQL projects.
Watch the companion webinar for this presentation at http://embt.co/KLopez826. In this webinar, Karen Lopez of InfoAdvisors will cover 10 tips for the modern data architect and resources for coming up to speed on these new approaches. She will share how modern data modeling approaches address both SQL (relational) and NoSQL technologies. We'll look at the role of a data modeler, and how models, processes and data governance processes can add value to enterprise big data and NoSQL development projects.
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez
Karen Lopez's presentation about 10 Physical Data Modeling/Database Design blunders, based on her work in helping organizations get the most value out of their models and data.
Notice an error? Let me know. I welcome this sort of feedback.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
In this Strata+Hadoop World 2015 presentation, Ron Bodkin, President of Think Big, a Teradata company, explains changes for data modeling on big data systems and five important new analytic patterns becoming more commonplace as companies grow their data driven capabilities.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
Watch the companion webinar for this presentation at http://embt.co/KLopez826. In this webinar, Karen Lopez of InfoAdvisors will cover 10 tips for the modern data architect and resources for coming up to speed on these new approaches. She will share how modern data modeling approaches address both SQL (relational) and NoSQL technologies. We'll look at the role of a data modeler, and how models, processes and data governance processes can add value to enterprise big data and NoSQL development projects.
Karen Lopez 10 Physical Data Modeling BlundersKaren Lopez
Karen Lopez's presentation about 10 Physical Data Modeling/Database Design blunders, based on her work in helping organizations get the most value out of their models and data.
Notice an error? Let me know. I welcome this sort of feedback.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
In this Strata+Hadoop World 2015 presentation, Ron Bodkin, President of Think Big, a Teradata company, explains changes for data modeling on big data systems and five important new analytic patterns becoming more commonplace as companies grow their data driven capabilities.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
To succeed in the world’s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes, and products. This means embracing the shift to Enterprise AI, using the power of machine learning to enhance - not replace - humans.
Dataiku is the centralized data platform that moves businesses along their data journey from analytics at scale to Enterprise AI, powering self-service analytics while also ensuring the operationalization of machine learning models in production.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
A look at the many facets of schema-less approaches vs a rich schema approach, ranging from performance and query support to heterogeneity and code/data migration issues. Presented by Leon Guzenda, Founder, Objectivity
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down.
Connected data is the key to your business succeeding and growing in today’s connected world.
Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights.
Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide:
- Increased visibility between internal groups
- Efficiency gains
- Cross-functional data collaboration
- Core complete and reliable business insights
- Better customer engagement
The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4
Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/
Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
Learn the tips and tricks how to handle Data Modeling in your Big Data environment. Mark will show how modeling will add value to the business and how to make your Big Data landscape transparent across the organization.
You will see the latest modeling techniques for Big Data and different types of modeling notations. Also you will learn how to integrate Data Modeling into your BI environment.
Graph Databases - Where Do We Do the Modeling Part?DATAVERSITY
Graph processing and graph databases have been with us for a while. However, since their physical implementations are the same for every database in production (Node connected to node, or triplets), there's a perception that data modeling (and data modelers) have no role on projects where graph databases are used.
This month we'll talk about where graph databases are a best fit in a modern data architecture and where data models add value.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Overview of the SlamData open source project for modern data analytics. SlamData allows users to run ordinary SQL queries on modern NoSQL Data like JSON. Currently we support MongoDB, but plan to support other NoSQL datastores including Cassandra, Hadoop and others. Our project opens up modern NoSQL data to anyone with basic SQL skills.
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
Being able to analyze sales at the most granular level with up-to-date data, provides a competitive advantage for unlocking additional revenue -- especially for e-commerce and retail companies heading into the holiday season.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.
From SQLBitsXV
Notice an error? Let me know. I welcome this sort of feedback.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download the slides
Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
To succeed in the world’s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes, and products. This means embracing the shift to Enterprise AI, using the power of machine learning to enhance - not replace - humans.
Dataiku is the centralized data platform that moves businesses along their data journey from analytics at scale to Enterprise AI, powering self-service analytics while also ensuring the operationalization of machine learning models in production.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
A look at the many facets of schema-less approaches vs a rich schema approach, ranging from performance and query support to heterogeneity and code/data migration issues. Presented by Leon Guzenda, Founder, Objectivity
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down.
Connected data is the key to your business succeeding and growing in today’s connected world.
Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights.
Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide:
- Increased visibility between internal groups
- Efficiency gains
- Cross-functional data collaboration
- Core complete and reliable business insights
- Better customer engagement
The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4
Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/
Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
Learn the tips and tricks how to handle Data Modeling in your Big Data environment. Mark will show how modeling will add value to the business and how to make your Big Data landscape transparent across the organization.
You will see the latest modeling techniques for Big Data and different types of modeling notations. Also you will learn how to integrate Data Modeling into your BI environment.
Graph Databases - Where Do We Do the Modeling Part?DATAVERSITY
Graph processing and graph databases have been with us for a while. However, since their physical implementations are the same for every database in production (Node connected to node, or triplets), there's a perception that data modeling (and data modelers) have no role on projects where graph databases are used.
This month we'll talk about where graph databases are a best fit in a modern data architecture and where data models add value.
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.
This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.
Overview of the SlamData open source project for modern data analytics. SlamData allows users to run ordinary SQL queries on modern NoSQL Data like JSON. Currently we support MongoDB, but plan to support other NoSQL datastores including Cassandra, Hadoop and others. Our project opens up modern NoSQL data to anyone with basic SQL skills.
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
Being able to analyze sales at the most granular level with up-to-date data, provides a competitive advantage for unlocking additional revenue -- especially for e-commerce and retail companies heading into the holiday season.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.
From SQLBitsXV
Notice an error? Let me know. I welcome this sort of feedback.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download the slides
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
Automated Schema Design for NoSQL DatabasesMichael Mior
Selecting appropriate indices and materialized views is critical for high performance in relational databases. By example, we show that the problem of schema optimization is also highly relevant for NoSQL databases. We explore the problem of schema design in NoSQL databases with a goal of optimizing query performance while minimizing storage overhead. Our suggested approach uses the cost of executing a given workload for a given schema to guide the mapping from the application data model to a physical schema. We propose a cost-driven approach for optimization and discuss its usefulness as part of an automated schema design tool.
A couple of major players in the internet space, in particular Amazon, LinkedIn and Google, opened the eyes of the corporate world to the coming onslaught of a NoSQL workload. As with every new market opportunity, some young guns quickly jumped in to capitalize on the need and confusion, but things are starting to settle and NoSQL is maturing as Enterprise ready solutions break away with long sought after features. In this webcast, learn about NoSQL convergence from Oracle, the leader in data management and hear why some flavors of NoSQL are here to stay.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
NoSE: Schema Design for NoSQL ApplicationsMichael Mior
Database design is critical for high performance in relational databases and many tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate for this setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema.
We implemented a prototype of this approach for the Cassan-dra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومfaradars
بانک اطلاعاتی اوراکل بی شک یکی از قدرتمندترین نرم افزارها برای مدیریت اطلاعاتی با حجم بسیار بالا می باشد هدف از این آموزش یادگیری مفاهیم پیچیده معماری و چالش های مدیریتی دیتابیس است که به شما کمک خواهد کرد تا به سرعت مطالب را فرا گرفته و به اهداف خود نزدیک شوید .
سرفصل هایی که در این آموزش به آن پرداخته شده است:
معماری دیتابیس اوراکل
آماده سازی محیط بانک اطلاعاتی
ایجاد دیتابیس اوراکل
مدیریت بخش حافظه ای اوراکل
پیکربندی محیط شبکه در اوراکل
...
برای توضیحات بیشتر و تهیه این آموزش لطفا به لینک زیر مراجعه بفرمائید:
http://faradars.org/courses/fvorc9408
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
NoSQL data stores have emerged for scalable capture and real-time analysis of data. Apache Spark and Hadoop provide additional scalable analytics processing. This session looks at these technologies and how they can be used to support operational analytics to improve operational effectiveness. It also looks at an example of how operational analytics can be implemented in NoSQL environments using the Basho Data Platform with Apache Spark:
•The emergence of NoSQL, Hadoop and Apache Spark
•NoSQL Use Cases
•The need for operational analytics
•Types of operational analysis
•Key requirements for operational analytics
•Operational analytics using the Basho Data Platform with Apache Spark.
A brief overview of currently popular & available key/value, column oriented & document oriented databases, along with implementation suggestions for the CakePHP web application framework.
Data Modeling for Integration of NoSQL with a Data WarehouseDaniel Upton
Learn to model data to be visible and accessible between NOSQL Big Data repositories and your RDBMS Data Warehouse. Learn how specific RDBMS Data Warehouse data modeling approaches establish flexible integration with NoSQL data sets that do not play by E.F. Codd’s rules.
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)Michael Bleigh
Persistence Smoothie is a talk given at RubyNation 2010 about when, how, and why to use combinations of persistence engines (including both SQL and NoSQL options) with a live example. The code is available at http://github.com/mbleigh/persistence-smoothie
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
Big Data and NoSQL have led to big changes In the data environment, but are they all in the best interest of data? Are they technologies that "free us from the harsh limitations of relational databases?"
In this month's webinar, we will be answering questions like these, plus:
Have we managed to free organizations from having to do Data Modeling?
Is there a need for a Data Modeler on NoSQL projects?
If we build Data Models, which types will work?
If we build Data Models, how will they be used?
If we build Data Models, when will they be used?
Who will use Data Models?
Where does Data Quality happen?
Finally, we will wrap with 10 tips for data modelers in organizations incorporating NoSQL in their modern Data Architectures.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
In this lecture we analyze key-values databases. At first we introduce key-value characteristics, advantages and disadvantages.
Then we analyze the major Key-Value data stores and finally we discuss about Dynamo DB.
In particular we consider how Dynamo DB: How is implemented
1. Motivation Background
2. Partitioning: Consistent Hashing
3. High Availability for writes: Vector Clocks
4. Handling temporary failures: Sloppy Quorum
5. Recovering from failures: Merkle Trees
6. Membership and failure detection: Gossip Protocol
Using Drupal 8 + D3 + Arduino to Create Real World SolutionsKevin Wehmueller
Miss our Drupal GovCon session? That's OK! We're glad you're interested in hearing what we had to say! This slide deck covers the basic elements of our work with DAI to create the Hidrosonico (a working title), a proof-of-concept device that leverages Arduino, Drupal 8, and D3.js to broadcast and display water levels. Hopefully, this prototype will lead to a solution that can preempt flooding disasters, improve evacuation response times, and save lives.
Want to know more? Drop us a line at hello@taoti.com.
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongDATAVERSITY
Is your organization using agile approaches to systems development project? Have you found that there are conflicting opinions with what should be done, when it should be done and who should do it? Is there even a suggestion that data modeling isn’t needed on an Agile project? Are your data architects stuck in a waterfall world? Are you asking for “no more changes” to the data model? Do your developers thing that “just the right documentation” means no modeling allowed? Does anyone even know where the reference data for the application is located? Or how it is updated?
In this month’s webinar, Karen will show you how data modeling and Agile approaches CAN work together to deliver quality information systems and solutions, with fewer dysfunctions and less tears.
2015 is knocking on the door and will be an exciting and surprising year for the BI industry. However, not everything will be a surprise for Panorama as we are always on top of the latest trends influencing the Business Intelligence community.
• What will the future hold for the industry?
• What are our BI experts thoughts, predictions and internal assessments on what new directions the Business Intelligence community will see in the coming year?
• Countdown of the most important trends in the industry
Watch the companion webinar: http://embt.co/1BIRvPw
Business users and analysts are often trying to solve a very specific data-related problem, and when researching it, may wonder why certain items don’t correlate. Maybe you need to reconcile old data and new data, and eliminate erroneous entries. How do you find what the various terms mean and where the relevant data resides? Business stakeholders need visibility to the organization’s models and metadata, but at the right level of detail for their use. Join this session to learn about business data access challenges, including:
+ What issues exist with current methods
+ What information business users really need
+ How to find that information
Karen Lopez will share tips and insights on working through the data challenges for business analysts and Josh Buckner will share a solution to address those concerns.
Information is at the heart of all architecture disciplines & why Conceptual ...Christopher Bradley
Information is at the heart of all of the architecture disciplines such as Business Architecture, Applications Architecture and Conceptual Data Modelling helps this.
Also, data modelling which helps inform this has been wrongly taught as being just for Database design in many Universities.
chris.bradley@dmadvisors.co.uk
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
10 top BI trends for 2016 – by Panorama
Its all about the insight
Visual perception rules
The learning suggestive system - AI gets real
The data product chain becomes democratized
Cloud (finally)
“Mobile”
Automated data integration
Interned of things data accelerating into reality
Hadoop accelerators are the last chance for Hadoop
Fading of the centralized on–premise DWH
Data has been increasing at an exponential rate and organizations are either struggling to cope up or rushing to take advantage by analyzing it. Hadoop is an excellent open source framework, which addresses this big data problem.
I have used Hadoop within the financial sector for the last few years but could not find any resource or book that explains the usage of Hadoop for finance use cases. The best books I have ever found are again on Hadoop, Hive, or some MapReduce patterns, with examples on counting words or Twitter messages in all possible ways.
I have written this book with the objective of explaining the basic usage of Hadoop and other products to tackle big data for finance use cases. I have touched base on the majority of use cases, providing a very practical approach.
The book sold on:
http://www.amazon.co.uk/381/dp/B00X3TVGJY/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=&qid=
http://www.amazon.com/381/dp/B00X3TVGJY/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=&qid=
http://www.amazon.in/381/dp/B00X3TVGJY/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=&qid=
This was first part of the presentation on "Road Map for Careers in Big Data" in Conjunction with Hortonworks/Aengus Rooney on 17th August 2016 in London. For those contemplating moving to Big Data from often Relational Background
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.
It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.
This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.
No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
Data is that the fuel that launches firms earlier than the pack—often so much ahead. people who will collect, analyze, associated quickly build use of knowledge can have an ever-increasing competitive advantage. That’s the promise of huge knowledge.
Hadoop is regarded as a key capability for implementing Big Data initiatives in the enterprise, but organizations have yet to realize its full business benefits. In this webinar, Pivotal and guest Forrester Research, Inc. Identify the use cases driving Hadoop adoption, and explore what is needed to transform initial investments into results.
Learn about:
Challenges Hadoop introduces, and how the right tools and platforms can help address them
Shifts in the industry with regards to SQL and NoSQL systems and their implications to Big Data analytics
Applying in-memory technologies for data management systems, data analytics, transactional processing and operational databases
Watch the on-demand webinar here:
http://www.pivotal.io/big-data/pivotal-forrester-operationalizing-data-analytics-webinar
Learn how to maximize business value from all of your data here: http://www.pivotal.io/big-data/pivotal-hd
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
Slides from the Pivotal Open Source Hub Meetup
"Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Science!"
As the need for data science as a key differentiator grows in all industries, from large corporations to startups, the need to get to results quickly is enabled by sharing ideas and methods in the community. The data science team at Pivotal leverages and contributes to this community of publicly available and open source technologies as part of their practice. We will share the resources we use by highlighting specific toolkits for building models (e.g. MADlib, R) and visualization (e.g. Gephi and Circos) along with their benefits and limitations by sharing examples from Pivotal's data science engagements. At the end of this session we hope to have answered the questions: Where can I get started with Data Science? Which toolkit is most appropriate for building a model with my dataset? How can I visualize my results to have the greatest impact?
Bio: Sarah Aerni is a member of the Pivotal Data Science team with a focus on healthcare and life science. She has a background in the field of Bioinformatics, developing tools to help biomedical researchers understand their data. She holds a B.S. In Biology with a specialization in Bioinformatics and minor in French Literature from UCSD, and an M.S. and Ph.D in Biomedical Informatics from Stanford University. During her time as a researcher she focused on the interface between machine learning and biology, building computational models enabling research for a broad range of fields in biomedicine. She also co-founded a start-up providing informatics services to researchers and small companies. At Pivotal she works with customers in life science and healthcare building models to derive insight and business value from their data.
Data Integration is a key part of many of today’s data management challenges: from data warehousing, to MDM, to mergers & acquisitions. Issues can arise not only in trying to align technical formats from various databases and legacy systems, but in trying to achieve common business definitions and rules.
Join this webinar to see how a data model can help with both of these challenges – from ‘bottom-up’ technical integration, to the ‘top-down’ business alignment.
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
A tenet of the Agile Manifesto is ‘Working software over comprehensive documentation’, and many have interpreted that to mean that data models are not necessary in the agile development environment. Others have seen the value of data models for achieving the other core tenets of ‘Customer Collaboration’ and ‘Responding to Change’.
This webinar will discuss how data models are being effectively used in today’s Agile development environment and the benefits that are being achieved from this approach.
We recently presented our technology solution for metadata discovery to the Boulder Business Intelligence Brains Trust in Colorado. (www.bbbt.us)
The whole session was also videod and there is a link to the recording at the end of the presentation.
Geek Sync | Avoid the Seven Mistakes Data Modelers Make in Aiding Data Govern...IDERA Software
You can watch the replay for this Geek Sync webcast, Avoid the Seven Mistakes Data Modelers Make in Aiding Data Governance, in the IDERA Resource Center, http://ow.ly/nCrq50A4q8G.
Data privacy, protection, and compliance legislation is becoming ever more important. In that context, organizations have been looking towards their data governance teams to make sure that they understand their data, know how it is classified, and where it resides.
In this session, join Karen Lopez in discussing the mistakes that data modelers make in supporting data governance programs — and that you should avoid! These mistakes include collaboration errors, data model security fails, data stewarding missteps, data model integrity harms, and more.
Newer compliance regulations can make these mistakes costly and difficult to recover from. Karen wants you to love your data — and your data model!
Speaker: Karen Lopez has more than 20 years of database design experience. She specializes in the practical application of design approaches, balancing development time frames with the need to deliver solutions that will support business agility and data quality needs. She’s known for her fun and engaging speaking and teaching style. She tweets about data, space exploration and her travel experiences at @datachick. Karen blogs at www.datamodel.com.
Panorama Necto uncovers the hidden insights in your data and presents them in beautiful dashboards powered with KPI Alerts, and is managed by the most secure, centralized & state of the art Business Intelligence.
Similar to NoSQL and Data Modeling for Data Modelers (20)
Slide deck for the DGIQ SIG on AI Ethics.
Are you concerned about data and AI ethics? Do you worry about how to make sure the algorithms and systems that affect our lives are fair, honest, responsible, and respectful of our rights and values? Do you have opinions about how to build an organizational culture that cares about these topics
Join us for what will surely be a lively and interesting session where you are the speakers.
Special interest group (SIG) discussions are group conversations on topics that are new, or specific to an audience segment. The format is casual and without any formal presentation. The objective is to engage all participants in an exchange of ideas, questions, and advice, so please come with a willingness to participate in the conversation.
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...Karen Lopez
SQL Server includes multiple features that focus on data security, privacy, and developer productivity. In this session, we will review the best features from a database designer’s and developer’s point of view.
– Always Encrypted
– Dynamic Data Masking
– Row Level Security
– Data Classification
– Assessments
– Defender for SQL Server
– Ledger Tables
…and more
We’ll look at new and older features, why you should consider them, where they work, where they don’t, who needs to be involved in using them, and what changes, if any, need to be made to applications or tools that you use with SQL Server.
You will learn:
– The pros and cons of implementing each feature
– How implementing these new features may impact existing applications
– 10 tips for enhancing SQL Server security and privacy protections
Designer's Favorite New Features in SQLServerKaren Lopez
A database designer's favourte features in SQL Server...with a bit of Azure SQL DB, too.
Always Encrypted
Row Level Security
Microsoft Purview
Azure Enabled SQL Server
Azure Defender for SQL
Azure Defender for Cloud
Dynamic Data Masking
Ledger Database and Tables
Data Privacy
Data Governance
Karen's Presentation to DAMA Chicago and other DAMA Chapters on 15 February 2023.
This presentation is less about data lakes that it is about Data Quality and how data professionals should think about designing and architecting systems that best meet the needs of how data works in the real world.
Expert Cloud Data Backup and Recovery Best Practice.pptxKaren Lopez
We’ve been deploying backup solutions since the beginning of computing and the foundations of backup and recovery have stayed the same: make sure backups run consistently and set recovery objectives. Yet systems in 2022 don’t work or act the same way they did decades ago. Cloud data backups have helped us meet the need for offsite backups, as well as impacted how we budget for them. Ransomware has impacted how we store them. The laws of physics might be more of an issue than when we had tapes stored in a safe down the hall. Cost models have changed, too.
In this session, Karen Lopez covers best practices for modern data recovery…and she will share stories of worst practices just to keep it real.
Manage Your Time So It Doesn't Manage YouKaren Lopez
NASA Space Apps NYC Pre-Hackathon Symposium presentation by Karen Lopez, InfoAdvisors and NASA Datanaut. Karen presents on how to successfully manage your time and deliverables in the NASA Space Apps Challenge no matter where you are participating.
This one-hour presentation covers the tools and techniques for migrating SQL Server databases and data to Azure SQL DB or SQL Server on VM. Includes SSMA, DMA, DMS, and more.
Blockchain for the DBA and Data ProfessionalKaren Lopez
An overview of blockchain fundamentals, including examples of Oracle 20c Blockchain Tables. Includes concepts of trust, immutability, hashes, distributed nodes, and cryptography.
Blockchain for the DBA and Data ProfessionalKaren Lopez
With all the hype around blockchain, why should a DBA or other data professional care? In this session, we will cover the basics of blockchain as it applies to data and database processes:
Immutability
Verification
Distribution
Cryptography
Transactions
Trust
We will look at current offerings for blockchain features in Azure and in database and data stores. Finally, we'll help you identify the types of business requirements that need blockchain technologies.
You will learn:
Understand the valid uses of blockchain approaches in databases
How current technologies support blockchain approaches
Understand the costs, benefits, and risks of blockchain
Data Security and Protection in DevOps Karen Lopez
Presentation to London #WinOps event Sept 2019. Focusing on data security, privacy, and protection on DevOps efforts. Includes data masking, dev and test, data, Alwasy Encrypted, and more.
Data Modeling for Security, Privacy and Data ProtectionKaren Lopez
Karen Lopez (@datchick/InfoAdvisors) 90-minute presentation on Data Security, Data Privacy, Compliance and how data modelers should discover, assess, and monitor these important data management responsibilities.
Designing for Data Security by Karen LopezKaren Lopez
As security and complaince becomes more important for organizations, especially in the age of GDPR, data breach and other legislation, Karen covers the types of features data architects and designers should be considering when building modern, protected and defensive systems.
There are many data modeling and database design terms and jargon that uses the word "key." Do you know the difference between a surrogate key and a primary key? A super key and a candidate key? Could you explain them to a technical audience? A business user or an auditor?
In this presentation, Karen Lopez covers the concepts of primary keys, foreign keys, candidate key, surrogate keys, and more.
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
Karen Lopez talks to data architects and data moders how they can best deliver value on modern data drive projects beyond relational database technologies. She covers NoSQL Databases and Datastores, which data stories they best fit and which ones they don't. She ends with 10 tips for adding more value to ployschematic database solutions.
Karen's Favourite Features of SQL Server 2016Karen Lopez
Slides from a one hour webinar on Karen Lopez's favorite features from database designer's point of view. Topics include Always Encrypted, Data Masking, Row Level Security, Foreign Keys, JSON and more.
Notice an error? Let me know. I welcome this sort of feedback.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).