This document provides an overview of McGladrey, a large accounting firm, and outlines the agenda for a presentation on SharePoint migrations. The presentation covers the elements of a migration, important pre-migration steps like analysis and validation, testing the migration, and post-migration steps. It emphasizes the importance of thorough planning, documentation, and testing to prevent issues during and after the migration.
Data is an increasingly common term used on the assumption that its meaning is commonly understood. This presentation seeks to drill down into the very specifics of what data is all about.
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
Data security is critical to the success of large enterprises such as Mayo Clinic (MC). There is no exception for healthcare data stored on the enterprise Big Data platforms. At MC, healthcare Big Data ingestion, storage, processing and analytics are all in the enterprise-secured environments including Sandbox, Dev, Int/Test and Prod Hadoop clusters. The primary data security in the enterprise-secured Hadoop clusters has been achieved at MC by the combination of Knox Gateway/F5 Balancer, Ranger authorization/auditing, Two Factor local authentication (TFA) and Kerberos authentication that are coupled to MC Active Directory and LDAP. In other words, any major HDFS, HBase and Hive healthcare data operations at MC have to go through the dedicated Knox Gateway or F5 balancer (for Knox HA) via Rest API, which interacts with Ranger and other primary security components involved. The data security on the Big Data platforms at MC is going to be strengthened by the on-going network segmentation and SSL enabling on the related Hadoop ecosystem components. The above approaches adopted on MC Big Data platforms have significantly improved the security of data for the success of MC Big Data program although the data need high-skilled clients or applications to use.
Understanding Linked Data via EAV Model based Structured DescriptionsKingsley Uyi Idehen
Multi part series of presentations aimed at demystifying Linked Data via:
1. Introducing Entity-Attribute-Value Data Model
2. Exploring how we describe things
3. Referents, Identifiers, and Descriptors trinity .
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endKingsley Uyi Idehen
Detailed guide covering the configuration of a Virtuoso ODBC Data Source Name (DSN) into the Web of Linked Data en route to utilization via Tibco's SpotFire BI tool.
Basically, SpotFire as a Linked (Open) Data fronte-end via ODBC.
SharePoint Conference 2011 was the only official Microsoft Conference in North America this year to feature experts from Microsoft and around the world.
This fall a panel of C/D/H consultants, just back from SPC, hit the highlights and answered questions about important topics, like cloud services, best practices, adoption, SharePoint for internet sites, and of course, news about upcoming features and releases!
Whether you missed SPC or attended and just just want to find out more, view our slide deck.
And for more on this and other topics, visit our blog at www.cdhtalkstech.com.
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
Raise your hands if you are deploying Kerberos and other Hadoop security components after deploying Hadoop to the enterprise. We will present the best practices and challenges of implementing security on a large multi-tenant Hadoop cluster spanning multiple data centers. Additionally, we will outline our authentication & authorization security architecture, how we reduced complexity through planning, and how we worked with multiple teams and organizations to implement security the right way the first time. We will share lessons learned and takeaways for implementing security at your company.
We will walk through the implementation and its impacts to the user, development, support and security communities and will highlight the pitfalls that we navigated to achieve success. Protecting your customers and information assets is critical to success. If you are planning to introduce Hadoop security to your ecosystem, don’t miss this in depth discussion on a very important and necessary component to enterprise big data.
Making the Conceptual Layer Real via HTTP based Linked DataKingsley Uyi Idehen
A presentation that addresses pros and cons associated with approaches to making concrete conceptual models real. It covers HTTP based Linked Data and RDF data model as new mechanism for conceptual model oriented data access and integration.
Data is an increasingly common term used on the assumption that its meaning is commonly understood. This presentation seeks to drill down into the very specifics of what data is all about.
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...DataWorks Summit
Data security is critical to the success of large enterprises such as Mayo Clinic (MC). There is no exception for healthcare data stored on the enterprise Big Data platforms. At MC, healthcare Big Data ingestion, storage, processing and analytics are all in the enterprise-secured environments including Sandbox, Dev, Int/Test and Prod Hadoop clusters. The primary data security in the enterprise-secured Hadoop clusters has been achieved at MC by the combination of Knox Gateway/F5 Balancer, Ranger authorization/auditing, Two Factor local authentication (TFA) and Kerberos authentication that are coupled to MC Active Directory and LDAP. In other words, any major HDFS, HBase and Hive healthcare data operations at MC have to go through the dedicated Knox Gateway or F5 balancer (for Knox HA) via Rest API, which interacts with Ranger and other primary security components involved. The data security on the Big Data platforms at MC is going to be strengthened by the on-going network segmentation and SSL enabling on the related Hadoop ecosystem components. The above approaches adopted on MC Big Data platforms have significantly improved the security of data for the success of MC Big Data program although the data need high-skilled clients or applications to use.
Understanding Linked Data via EAV Model based Structured DescriptionsKingsley Uyi Idehen
Multi part series of presentations aimed at demystifying Linked Data via:
1. Introducing Entity-Attribute-Value Data Model
2. Exploring how we describe things
3. Referents, Identifiers, and Descriptors trinity .
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endKingsley Uyi Idehen
Detailed guide covering the configuration of a Virtuoso ODBC Data Source Name (DSN) into the Web of Linked Data en route to utilization via Tibco's SpotFire BI tool.
Basically, SpotFire as a Linked (Open) Data fronte-end via ODBC.
SharePoint Conference 2011 was the only official Microsoft Conference in North America this year to feature experts from Microsoft and around the world.
This fall a panel of C/D/H consultants, just back from SPC, hit the highlights and answered questions about important topics, like cloud services, best practices, adoption, SharePoint for internet sites, and of course, news about upcoming features and releases!
Whether you missed SPC or attended and just just want to find out more, view our slide deck.
And for more on this and other topics, visit our blog at www.cdhtalkstech.com.
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
Raise your hands if you are deploying Kerberos and other Hadoop security components after deploying Hadoop to the enterprise. We will present the best practices and challenges of implementing security on a large multi-tenant Hadoop cluster spanning multiple data centers. Additionally, we will outline our authentication & authorization security architecture, how we reduced complexity through planning, and how we worked with multiple teams and organizations to implement security the right way the first time. We will share lessons learned and takeaways for implementing security at your company.
We will walk through the implementation and its impacts to the user, development, support and security communities and will highlight the pitfalls that we navigated to achieve success. Protecting your customers and information assets is critical to success. If you are planning to introduce Hadoop security to your ecosystem, don’t miss this in depth discussion on a very important and necessary component to enterprise big data.
Making the Conceptual Layer Real via HTTP based Linked DataKingsley Uyi Idehen
A presentation that addresses pros and cons associated with approaches to making concrete conceptual models real. It covers HTTP based Linked Data and RDF data model as new mechanism for conceptual model oriented data access and integration.
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
Presented by Hien Luu, Technical Lead, LinkedIn
Rajasekaran Rangaswamy, LinkedIn
For internet companies, marketing campaigns play an important role in acquiring new customers, retaining and engaging existing customers, and promoting new products. The LinkedIn segmentation and targeting platform helps marketing teams to easily and quickly create member segments based on member attributes using nested predicate expressions ranging from simple to complex. Once segments are created, then those qualified members are targeted with marketing campaigns.
Lucene is a key piece of technology in this platform. This session will cover how we leverage Hadoop to efficiently build Lucene indexes for a large and growing member attribute data set of 225 million members, and how Lucene is used to create segments based on complex nested predicate expressions. This presentation will also share some of the lessons we learned and challenges we encountered from using Lucene to search over large data sets.
In the real world, "find-ability" is just as important as "put-ability" when building a well-structured ERMS. This session explores effective strategies for defining and capturing the critical metadata needed to drive RM-specific search scenarios.
The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. This presentation provides a set of simple examples that illustrates how an LDP client can interact with an LDP server in the context of a read-write Linked Data application i.e. how to use the LDP protocol for retrieving, updating, creating and deleting Linked Data resources.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
Security has always been a fundamental requirement for enterprise adoption. For example, in a company, billing, data science, and regional marketing teams may all have the required access privileges to view customer data, while sensitive data like credit card numbers should be accessible only to the finance team. Previously, Apache Hive™ with Apache Ranger™ policies was used to manage such scenarios. In this talk, we shows that Apache Spark™ SQL is aware of the existing Apache Ranger policies defined for Apache Hive. In other words, for SQL users, access to databases, tables, rows and columns are controlled in a fine-grained manner, irrespective of whether the data is analyzed using Apache Spark SQL or Hive. If a policy is updated, both Apache Spark and Apache Hive users get their result consistently. In addition, all fine-grained access via Apache Spark SQL can be monitored and searched through a centralized interface via Apache Ranger.
For decades developers and DBAs have battled over who controls the world. With each new development paradigm the battle flares again as developers push DBAs to adopt and support new data structures (JSON), new APIs (REST services), new technologies (In-Memory) and new platforms (Cloud). In this session, Gerald Venzl takes on the role of lead developer on a project to deploy a RESTful web-based application for a new coffeeshop chain, while Maria Colgan takes on the role of the DBA. Through the use of live demos, they learn to work together to find a solution that will allow them to embrace a more agile development approach, as well as the latest technology trends without exposing the business to painful availability or security vulnerabilities.
LDP4j: A framework for the development of interoperable read-write Linked Da...Nandana Mihindukulasooriya
This presentation introduces LDP4j, an open source Java-based framework for the development of read-write Linked Data applications based on the W3C Linked Data Platform 1.0 (LDP) specification and available under the Apache 2.0 license. This was presented in the ISWC 2014 Developer Woskshop.
http://www.ldp4j.org/
The convergence of reporting and interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL-on-Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. “Interactive BI”) on Hadoop data. Unlike general-purpose SQL-on-Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP-on-Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL-on-Hadoop toolbox so far.
But SQL-on-Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL-on-Hadoop engines are now setting their sights on interactive BI. This is great news for enterprises. As the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and Interactive BI on their Hadoop data, as opposed to having to host, manage, and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we’ll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We’ll use consistent terminology to describe what you get from multiple proprietary and open source products and outline advantages and disadvantages. You’ll come out equipped with the knowledge you need to read past marketing and sales pitches. You’ll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
Enterprise & Web based Federated Identity Management & Data Access Controls Kingsley Uyi Idehen
This presentation breaks down issues associated with federated identity management and protected resource access controls (policies). Specifically, it uses Virtuoso and RDF to demonstrate how this longstanding issue has been addressed using the combination of RDF based entity relationship semantics and Linked Open Data.
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
The communities for Apache Atlas and Apache Ranger, which are foundational components for Security and Governance across the Hadoop stack, have spawned a robust industry ecosystem of tools and platforms. Such industry solutions build upon the extensibility offered via open and robust APIs and integration patterns to provide innovative “better together” capabilities. In this talk, we will showcase how the ecosystem of solutions being built by different vendors provide value-added capabilities to address the key aspects of securing and governing your data lakes based on Apache Ranger and Apache Atlas frameworks. The talk will showcase multiple ecosystem demonstrations that will include how to identify, map, and classify personal data, harvest and maintain metadata, track and map the movement of data through your enterprise, and enforce appropriate controls to monitor access and usage of personal data.
Come hear from community partners:
-Balaji Ganesan from Privacera will showcase how Privacera integrates with and leverages Apache Ranger and Apache Atlas features to help with GDPR compliance
-Greg Goldsmith and Jordan Martz from Attunity will showcase how Attunity’s solutions integrate into Apache Atlas to provide robust chain of custody and classifications required for GDPR
-Somil Kulkarni from IBM will demonstrate how IBM Information Governance Catalog integrates with Apache Atlas to exchange metadata to build a connected solution for GDPR compliance that harnesses both open source community enhancements and IBM’s innovations in governance space.
Speakers
Ali Bajwa, Principal Solutions Engineer, Hortonworks
Srikanth Venkat, Senior Director Product Management, Hortonworks
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Building the Enterprise Data Lake: A look at architecturemark madsen
The topic is building an Enterprise Data Lake, discussing high level data and technology architecture. We will describe the architecture of a data warehouse, how a data lake needs to differ, and show a high level functional and data architecture for a data lake. This webinar will cover:
Why dumping data into Hadoop and letting users get it out doesn't work
The difference between a Hadoop application and a Data Lake
Why new ideas about data architecture are a key element
An Enterprise Data Lake reference architecture to frame what must be built
From XA Secure to Hortonworks, to founding Privacera, the story behind Balaji Ganesan and Don Bosco Durai of how Apache RangerTM came to be and has become the foundation for the industry’s first SaaS-based data governance and security platform, PrivaceraCloud.
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
Big Data is moving from hype to reality for many organisations. The value proposition is clear and sponsorship is high, but how do organisations execute?
Join Oracle and Contexti to discuss the typical journey of a big data project from concept to pilot to production.
• Discuss our experience with a regional Telco
• Common Use Cases across key verticals
• Defining and prioritising use cases
• The challenge of moving from Pilot to Production
• Common Operating Models for Big Data
• Funding a Big Data Capability going forward
• Pilots - common mistakes; challenges; success criteria
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
Presented by Hien Luu, Technical Lead, LinkedIn
Rajasekaran Rangaswamy, LinkedIn
For internet companies, marketing campaigns play an important role in acquiring new customers, retaining and engaging existing customers, and promoting new products. The LinkedIn segmentation and targeting platform helps marketing teams to easily and quickly create member segments based on member attributes using nested predicate expressions ranging from simple to complex. Once segments are created, then those qualified members are targeted with marketing campaigns.
Lucene is a key piece of technology in this platform. This session will cover how we leverage Hadoop to efficiently build Lucene indexes for a large and growing member attribute data set of 225 million members, and how Lucene is used to create segments based on complex nested predicate expressions. This presentation will also share some of the lessons we learned and challenges we encountered from using Lucene to search over large data sets.
In the real world, "find-ability" is just as important as "put-ability" when building a well-structured ERMS. This session explores effective strategies for defining and capturing the critical metadata needed to drive RM-specific search scenarios.
The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. This presentation provides a set of simple examples that illustrates how an LDP client can interact with an LDP server in the context of a read-write Linked Data application i.e. how to use the LDP protocol for retrieving, updating, creating and deleting Linked Data resources.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
Security has always been a fundamental requirement for enterprise adoption. For example, in a company, billing, data science, and regional marketing teams may all have the required access privileges to view customer data, while sensitive data like credit card numbers should be accessible only to the finance team. Previously, Apache Hive™ with Apache Ranger™ policies was used to manage such scenarios. In this talk, we shows that Apache Spark™ SQL is aware of the existing Apache Ranger policies defined for Apache Hive. In other words, for SQL users, access to databases, tables, rows and columns are controlled in a fine-grained manner, irrespective of whether the data is analyzed using Apache Spark SQL or Hive. If a policy is updated, both Apache Spark and Apache Hive users get their result consistently. In addition, all fine-grained access via Apache Spark SQL can be monitored and searched through a centralized interface via Apache Ranger.
For decades developers and DBAs have battled over who controls the world. With each new development paradigm the battle flares again as developers push DBAs to adopt and support new data structures (JSON), new APIs (REST services), new technologies (In-Memory) and new platforms (Cloud). In this session, Gerald Venzl takes on the role of lead developer on a project to deploy a RESTful web-based application for a new coffeeshop chain, while Maria Colgan takes on the role of the DBA. Through the use of live demos, they learn to work together to find a solution that will allow them to embrace a more agile development approach, as well as the latest technology trends without exposing the business to painful availability or security vulnerabilities.
LDP4j: A framework for the development of interoperable read-write Linked Da...Nandana Mihindukulasooriya
This presentation introduces LDP4j, an open source Java-based framework for the development of read-write Linked Data applications based on the W3C Linked Data Platform 1.0 (LDP) specification and available under the Apache 2.0 license. This was presented in the ISWC 2014 Developer Woskshop.
http://www.ldp4j.org/
The convergence of reporting and interactive BI on HadoopDataWorks Summit
Since the early days of Hive, SQL on Hadoop has evolved from being a SQL wrapper on top of MapReduce to a viable replacement for the traditional EDW. In the meantime, while SQL-on-Hadoop vendors were busy adding enterprise capabilities and comparing their TPC-DS prowess against Hive, a niche industry emerged on the side for OLAP (a.k.a. “Interactive BI”) on Hadoop data. Unlike general-purpose SQL-on-Hadoop engines, which deal with the multiple aspects of warehousing, including reporting, OLAP-on-Hadoop engines focus almost exclusively on answering OLAP queries fast by using implementation techniques that had not been part of the SQL-on-Hadoop toolbox so far.
But SQL-on-Hadoop engines are not standing still. After having made huge progress in catching up to traditional EDWs for reporting workloads, SQL-on-Hadoop engines are now setting their sights on interactive BI. This is great news for enterprises. As the line between reporting and OLAP gets blurred, enterprises can now start considering using a single engine for both reporting and Interactive BI on their Hadoop data, as opposed to having to host, manage, and license two separate products.
Can a single engine satisfy both your reporting and Interactive BI needs? This may be a hard question to answer. Vendors use inconsistent terminology to describe their products and make ambitious and sometimes conflicting claims. This makes it very hard for enterprises to compare products, let alone decide which is the product that best matches their needs.
In this presentation, we’ll provide an overview of the different approaches to OLAP on Hadoop, and explain the key technologies behind each of them. We’ll use consistent terminology to describe what you get from multiple proprietary and open source products and outline advantages and disadvantages. You’ll come out equipped with the knowledge you need to read past marketing and sales pitches. You’ll be able to compare products and make an informed decision on whether a single engine for both reporting and Interactive BI on Hadoop is right for you.
Speaker
Gustavo Arocena, Big Data Architect, IBM
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
Enterprise & Web based Federated Identity Management & Data Access Controls Kingsley Uyi Idehen
This presentation breaks down issues associated with federated identity management and protected resource access controls (policies). Specifically, it uses Virtuoso and RDF to demonstrate how this longstanding issue has been addressed using the combination of RDF based entity relationship semantics and Linked Open Data.
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
The communities for Apache Atlas and Apache Ranger, which are foundational components for Security and Governance across the Hadoop stack, have spawned a robust industry ecosystem of tools and platforms. Such industry solutions build upon the extensibility offered via open and robust APIs and integration patterns to provide innovative “better together” capabilities. In this talk, we will showcase how the ecosystem of solutions being built by different vendors provide value-added capabilities to address the key aspects of securing and governing your data lakes based on Apache Ranger and Apache Atlas frameworks. The talk will showcase multiple ecosystem demonstrations that will include how to identify, map, and classify personal data, harvest and maintain metadata, track and map the movement of data through your enterprise, and enforce appropriate controls to monitor access and usage of personal data.
Come hear from community partners:
-Balaji Ganesan from Privacera will showcase how Privacera integrates with and leverages Apache Ranger and Apache Atlas features to help with GDPR compliance
-Greg Goldsmith and Jordan Martz from Attunity will showcase how Attunity’s solutions integrate into Apache Atlas to provide robust chain of custody and classifications required for GDPR
-Somil Kulkarni from IBM will demonstrate how IBM Information Governance Catalog integrates with Apache Atlas to exchange metadata to build a connected solution for GDPR compliance that harnesses both open source community enhancements and IBM’s innovations in governance space.
Speakers
Ali Bajwa, Principal Solutions Engineer, Hortonworks
Srikanth Venkat, Senior Director Product Management, Hortonworks
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Building the Enterprise Data Lake: A look at architecturemark madsen
The topic is building an Enterprise Data Lake, discussing high level data and technology architecture. We will describe the architecture of a data warehouse, how a data lake needs to differ, and show a high level functional and data architecture for a data lake. This webinar will cover:
Why dumping data into Hadoop and letting users get it out doesn't work
The difference between a Hadoop application and a Data Lake
Why new ideas about data architecture are a key element
An Enterprise Data Lake reference architecture to frame what must be built
From XA Secure to Hortonworks, to founding Privacera, the story behind Balaji Ganesan and Don Bosco Durai of how Apache RangerTM came to be and has become the foundation for the industry’s first SaaS-based data governance and security platform, PrivaceraCloud.
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
Big Data is moving from hype to reality for many organisations. The value proposition is clear and sponsorship is high, but how do organisations execute?
Join Oracle and Contexti to discuss the typical journey of a big data project from concept to pilot to production.
• Discuss our experience with a regional Telco
• Common Use Cases across key verticals
• Defining and prioritising use cases
• The challenge of moving from Pilot to Production
• Common Operating Models for Big Data
• Funding a Big Data Capability going forward
• Pilots - common mistakes; challenges; success criteria
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
Managed File Transfer is a one-stop solution to all your file transfer problems. It is becoming the choice of the digital enterprise.
Lets look at the drawbacks of traditional FTP and SFTP for file transfers and understand how Software AG’s Active Transfer Solution for MFT can help us simplify and reduce file transfer chaos while improving performance, security, reliability, and visibility.
Tame Big Data with Oracle Data IntegrationMichael Rainey
In this session, Oracle Product Management covers how Oracle Data Integrator and Oracle GoldenGate are vital to big data initiatives across the enterprise, providing the movement, translation, and transformation of information and data not only heterogeneously but also in big data environments. Through a metadata-focused approach for cataloging, defining, and reusing big data technologies such as Hive, Hadoop Distributed File System (HDFS), HBase, Sqoop, Pig, Oracle Loader for Hadoop, Oracle SQL Connector for Hadoop Distributed File System, and additional big data projects, Oracle Data Integrator bridges the gap in the ability to unify data across these systems and helps deliver timely and trusted data to analytic and decision support platforms.
Co-presented with Alex Kotopoulis at Oracle OpenWorld 2014.
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
In this presentation, Intel presents their journey, starting small and growing Data Virtualization to an Enterprise IT enabling use cases such as samples management, cloud, and big data for sales and marketing.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/jiYOHw.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
This presentation was presented at the July 8th 2014 user group meeting for BI Reporting for Bay Area Start Ups
Content - Creation Infocepts/DWApplications
Presented by: Scott Mitchell - DWApplications
Join Richard Harbridge for this fast paced in depth view of what’s coming in SharePoint Server 2016 and why those new features matter for IT Pros and what you can do to get ready for SharePoint Server 2016.
CON6619 - OpenWorld Presentation. Oracle data integration, big data, data governance, and cloud integration. Replication, ETL, Data Quality, Streaming Big Data, and Data Preparation
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
44. Was made possible by the generous
support of the following sponsors…
And by your participation… Thank you!
45. Join us for the raffle & SharePint
Be sure following to fill the out last session
your eval
form & turn in at the end of
the day for a ticket to the BIG
raffle!
Our vision is to deliver global capabilities with the local touch that brings world class assurance, tax and consulting experience to our clients through enduring relationships built on genuine understanding and trust. The following slides give you a brief overview of our firm and how we serve our clients.
Each industry presents unique challenges, and we have the experience and the knowledge to meet your every need. Here are some of the industries we serve.
Wherever you are, we have the people to partner with you. With more than 75 offices across the country, you can be confident in our ability to provide the experience you need to deliver real business value.
Recently, a 7500 person IT firm based in Detroit MI, divested many of their assets.
Many of the divisions went public or spun off
They needed to migrate their Corp based SharePoint 2007, 2010, 365 and non-SharePoint locations onto new Office 365 tenancies.
As a result, we spent this summer working.
We migrated 7000 Exchange accounts and 1TB of content using a variety of methods to each company’s new O365 domain.
We have had Dell Support, Share-Gate Support and MetaViz support calls each week (sometimes together) and have open tickets that are in their respective ticketing systems and communities.
Clockwork Software’s solution is strongly dependent on the files being stored on a file server and the paths being stored in SQL.
If you do NOT run any Pre Migration Analysis, you WILL fail
Plan what you and your Validation Team will be validating.
Metadata? Broken Links? Missing features.
Run Activity reporting to identify Most Active users
Communicate with those people regarding any new features that will be made available to them and their teams.
This “group” of leaders could be used in migration testing.
Take this migration effort as a opportunity to re-architect any Site Collections that have sprawled beyond their intended scope.
Identify all Webs in Site Collection that could be moved to another Site Collection.
You may have a DEV or TEMP site that will no longer be of use.
You can run a Storage Report to help identify these.
Save all InfoPath XSN files on your server that you created, if you don’t have them, regenerate the
List nonstandard SharePoint features
Users: Employees who are no longer with the company could possibly still be in the permission schema in SharePoint. Depending on how User Profile Synch was configured, they may have been already removed.
For a scrip to remove SharePoint users see: http://weblogs.asp.net/bsimser/powershell-tools-removing-orphaned-users-from-sharepoint
For SP2010 MySite Clean up job see http://www.harbar.net/archive/2011/02/10/account-deletion-and-sharepoint-2010-user-profile-synchronization.aspx
Orphans: When a new Site Collection is created, there are default SharePoint Groups (Owners, Members, Visitors). Often these are abandoned for inherited Permissions. If there are any empty groups, assure these also get deleted.
Empty the Recycle Bin. All of them. We all forget. Duh.
Large Lists: Pre-Migration reports will surely identify those lists with a large number of items, or large number of items in views. Your migration will benefit by applying best practices to these views and lists now.
Duplicates: These may NOT migrate and need to be Documented prior to migration and reapplied on the destination. See this post for a SP2010/2007 powershell http://www.pointbeyond.com/2011/08/24/finding-duplicate-documents-in-sharepoint-using-powershell
Export powershell if on-premises
See Document farm configuration settings in SharePoint 2013
http://technet.microsoft.com/en-us/library/ff645391(v=office.15).aspx
Site and List Templates used in the source environment must be available in the target if the tool is going to create new sites or lists during the operation.
To ensure all in use templates are available, this option will scan each site and list to identify the template type
The default list view threshold for optimal SharePoint perfomance is 5000 items.
This option will scan for and highlight any list that exceeds the total number of items defined in this parameter (default value is 5000). This is particulary important when migrating into SharePoint Online, as this value is not user configurable in online tenants.
Each tool gives a different depiction.
MetaLogix tool is shown here
Dell and MetaViz both do spreadsheets.
Either one shows total file counts, etc.
Discuss how long migrations may take
Tools refresh the cached content and settings, so constant refreshes are necessary
Each piece of content to migrate has to look at every feature that is associated with it.
Talk about how long these may take
Tools refresh the cached content and settings, so constant refreshes are necessary
Each piece of content to migrate has to look at every feature that is associated with it.
either your migration tool, or SharePoint’s “Export to Excel” feature,
Returns the first column of any item that matches the index. Four Parameters are specified