Cortana Analytics Workshop: Big Data @ MicrosoftMSAdvAnalytics
Raghu Ramakrishnan. Trends and challenges in big data analytics, and an outline of the Microsoft big data stack. Go to https://channel9.msdn.com/ to find the recording of this session.
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
Part 1 of a conference workshop. This forms the morning session, which looks at moving from Business Intelligence to Analytics.
Topics Covered: Azure Data Explorer, Azure Data Factory, Azure Synapse Analytics, Event Hubs, HDInsight, Big Data
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
This is a step-by-step approach the entire ecosystem of features driven by Azure Data eXplorer. You can find many examples using Kusto dialect, in order to acquire data, process and build up complete web interfaces using only one service: ADX.
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
Cortana Analytics Workshop: Big Data @ MicrosoftMSAdvAnalytics
Raghu Ramakrishnan. Trends and challenges in big data analytics, and an outline of the Microsoft big data stack. Go to https://channel9.msdn.com/ to find the recording of this session.
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
Part 1 of a conference workshop. This forms the morning session, which looks at moving from Business Intelligence to Analytics.
Topics Covered: Azure Data Explorer, Azure Data Factory, Azure Synapse Analytics, Event Hubs, HDInsight, Big Data
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
This is a step-by-step approach the entire ecosystem of features driven by Azure Data eXplorer. You can find many examples using Kusto dialect, in order to acquire data, process and build up complete web interfaces using only one service: ADX.
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
Azure Data Platform Services
HDInsight Clusters in Azure
Data Storage: Apache Hive, Apache Hbase, Azure Data Catalog
Data Transformations: Apache Storm, Apache Spark, Azure Data Factory
Healthcare / Life Sciences Use Cases
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
Rajesh Dadhia. This session introduces the newest services in the Cortana Analytics family. Azure Data Lake is a hyper-scale data repository designed for big data analytics workloads. It provides a single place to store any type of data in its native format. In this session, we will show how the HDFS compatibility of Azure Data Lake as a Hadoop File System enables all Hadoop workloads including Azure HDInsight, Hortonworks and Cloudera. Further, we will focus on the key capabilities of the Azure Data Lake that make it an ideal choice for storing, accessing and sharing data for a wide range of analytics applications. Go to https://channel9.msdn.com/ to find the recording of this session.
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
With Azure Data Lake Store, analyze all of your data in one place with no artificial constraints. Data Lake Store can store trillions of files.
Azure Data Lake Analytics: Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job.
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
Learn how to benefit from IoT (internet of things) to reduce costs and spur transformation for your company and clients. Attendees will learn about building blocks to create an IoT solution, and walk through real life architectural decisions in building a solution.
Data Con LA 2020
Description
Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying.
Speaker
Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer
Cloudian HyperStore Storage System is a peer-to-peer software defined storage platform, providing an enterprise grade S3-compliant object storage system on low cost commodity servers. Its multi-tenanted and multi-interface design can support many applications on the same platform.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
This is the presentation from Big Data November Bangalore Meetup 2014.
http://technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- What does THE HIVE provide?
- Goals of Synapse Tech Stack
- THE HIVE Startups
- Demystifying IoT Market
- Synapse Stack for IoT
- Big Data Challenge
- Synapse Lambda Architecture
- Synapse Components
- Synapse Internals
- AKILI – Synapse Machine Learning
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCMLconf
Matrix Decomposition at Scale: Matrix decomposition is an incredibly common task in machine learning, appearing everywhere including recommendation algorithms (SVD++), dimensionality reduction (PCA), and natural language processing (Latent Semantic Analysis) . Many well-known existing libraries can compute matrix decompositions when matrices fit in memory on a single machine. When the matrix no longer fits in memory and distributed computation is required, the computations becomes more complex and the details of the implementation become much more important. In this talk I will focus on the three major open source implementations of distributed eigen/singular value decomposition– LanczosSolver and StochasticSVD in Mahout and the SVD implementation in Spark MLLib. I will discuss the tradeoffs of of these implementations from the perspective of real world performance (beyond big-o notation for flops) and accuracy. I will conclude with some guidelines for choosing which implementation to use based on accuracy, performance, and scale requirements.
Azure Data Platform Services
HDInsight Clusters in Azure
Data Storage: Apache Hive, Apache Hbase, Azure Data Catalog
Data Transformations: Apache Storm, Apache Spark, Azure Data Factory
Healthcare / Life Sciences Use Cases
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
Rajesh Dadhia. This session introduces the newest services in the Cortana Analytics family. Azure Data Lake is a hyper-scale data repository designed for big data analytics workloads. It provides a single place to store any type of data in its native format. In this session, we will show how the HDFS compatibility of Azure Data Lake as a Hadoop File System enables all Hadoop workloads including Azure HDInsight, Hortonworks and Cloudera. Further, we will focus on the key capabilities of the Azure Data Lake that make it an ideal choice for storing, accessing and sharing data for a wide range of analytics applications. Go to https://channel9.msdn.com/ to find the recording of this session.
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
The driving force behind Apache Spark (Databricks Inc.) and Microsoft have designed a joint service to quickly and easily create Big Data and Advanced Analytics solutions. The combination of the comprehensive Databricks Unified Analytics platform and the powerful capabilities of Microsoft Azure make it easy to analyse data streams or large amounts of data, as well asthe training of AI models. Sascha Dittmann shows in this session how the new Azure service can be set up and used in various real-world scenarios. He also shows, how to connect the various Azure Services to the Azure Databricks service.
With Azure Data Lake Store, analyze all of your data in one place with no artificial constraints. Data Lake Store can store trillions of files.
Azure Data Lake Analytics: Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python, and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly, and only pay per job.
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
Learn how to benefit from IoT (internet of things) to reduce costs and spur transformation for your company and clients. Attendees will learn about building blocks to create an IoT solution, and walk through real life architectural decisions in building a solution.
Data Con LA 2020
Description
Data warehouses are not enough. Data lakes are the backbone of a modern data environment. Data Lakes are best built leveraging unique services of the cloud provider to reduce operations complexity. This session will explain why everyone's talking about data lakes, break down the best services in Azure to build a Data Lake, and walk through code for querying and loading with Azure Databricks and Event Hubs for Kafka. Attendees will leave the session with a firm grasp of why we build data lakes and how Azure Databricks fits in for ETL and querying.
Speaker
Dustin Vannoy, Dustin Vannoy Consulting, Principal Data Engineer
Cloudian HyperStore Storage System is a peer-to-peer software defined storage platform, providing an enterprise grade S3-compliant object storage system on low cost commodity servers. Its multi-tenanted and multi-interface design can support many applications on the same platform.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
This is the presentation from Big Data November Bangalore Meetup 2014.
http://technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- What does THE HIVE provide?
- Goals of Synapse Tech Stack
- THE HIVE Startups
- Demystifying IoT Market
- Synapse Stack for IoT
- Big Data Challenge
- Synapse Lambda Architecture
- Synapse Components
- Synapse Internals
- AKILI – Synapse Machine Learning
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
Juliet Hougland, Data Scientist, Cloudera at MLconf NYCMLconf
Matrix Decomposition at Scale: Matrix decomposition is an incredibly common task in machine learning, appearing everywhere including recommendation algorithms (SVD++), dimensionality reduction (PCA), and natural language processing (Latent Semantic Analysis) . Many well-known existing libraries can compute matrix decompositions when matrices fit in memory on a single machine. When the matrix no longer fits in memory and distributed computation is required, the computations becomes more complex and the details of the implementation become much more important. In this talk I will focus on the three major open source implementations of distributed eigen/singular value decomposition– LanczosSolver and StochasticSVD in Mahout and the SVD implementation in Spark MLLib. I will discuss the tradeoffs of of these implementations from the perspective of real world performance (beyond big-o notation for flops) and accuracy. I will conclude with some guidelines for choosing which implementation to use based on accuracy, performance, and scale requirements.
Microsoft Technologies for Data Science 201612Mark Tabladillo
Delivered to SQL Saturday BI Edition -- Atlanta, GA
Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.
Review of the book 'The Lean Startup' by Eric Ries. Entrepreneurs can use these concepts in a variety of businesses both new and old. Great group conversation to explore how a variety of people can use the methodology.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB
Presented by, Dr Christian Geuer-Pollmann, Senior Technology Evangelist at Microsoft.
The presentation gives a solid overview to the Microsoft Azure platform, with a special emphasis on scenarios for IoT workloads. First, Christian provides an introduction to Microsoft Azure’s IaaS compute and networking infrastructure (i.e. virtual machines, virtual networks, load balancers and HA concepts). The second part of the presentation focuses on higher-order services in Azure, such as relational data bases, machine learning, search, and NoSQL offerings. Last, Christian explains how the Azure Service Bus and the Intelligent Systems Services fit into the overall IoT landscape.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
To the Cloud and beyond (Nantes, Rebuild 2018)Alex Danvy
Une conférence en 2 temps :
1) A tous ceux qui voudrait voir dans le Cloud un univers clivant, opposant le nouveau monde à l'ancien, les développeurs aux opérateurs, les startups aux grandes entreprises. Je fais la démonstration ici, qu'aucontraire, il constitue un vecteur très fort de convergeance.
2) Le Cloud et ses évolutions à venir, du serverless au Quantum Computing.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
Full review 04.2020 about Azure Data Explorer service. Slide Desk is a sort of review od Kusto, in terms of usage, ingestion techniques, querying and exporting data, using anomaly detection and clustering methods.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Slides from a presentation by Unidev CEO Greg Alexander discussing Cloud Computing, including information on Software as a Service, Platform as a Service, and Infrastructure as a Service.
Join us for a deep dive into Windows Azure. We’ll start with a developer-focused overview of this brave new platform and the cloud computing services that can be used either together or independently to build amazing applications. As the day unfolds, we’ll explore data storage, SQL Azure™, and the basics of deployment with Windows Azure. Register today for these free, live sessions in your local area.
Introduces the Microsoft’s Data Platform for on premise and cloud. Challenges businesses are facing with data and sources of data. Understand about Evolution of Database Systems in the modern world and what business are doing with their data and what their new needs are with respect to changing industry landscapes.
Dive into the Opportunities available for businesses and industry verticals: the ones which are identified already and the ones which are not explored yet.
Understand the Microsoft’s Cloud vision and what is Microsoft’s Azure platform is offering, for Infrastructure as a Service or Platform as a Service for you to build your own offerings.
Introduce and demo some of the Real World Scenarios/Case Studies where Businesses have used the Cloud/Azure for creating New and Innovative solutions to unlock these potentials.
The cloud is all the rage. Does it live up to its hype? What are the benefits of the cloud? Join me as I discuss the reasons so many companies are moving to the cloud and demo how to get up and running with a VM (IaaS) and a database (PaaS) in Azure. See why the ability to scale easily, the quickness that you can create a VM, and the built-in redundancy are just some of the reasons that moving to the cloud a “no brainer”. And if you have an on-prem datacenter, learn how to get out of the air-conditioning business!
Conf 2018 Track 1 - Tessl / revolutionising the house moving processTechExeter
by Jonathan Brook
Tessl - using technology and data to revolutionise the house moving process
Presented at the 2018 TechExeter Conference https://techexeter.uk
Conf 2018 Track 3 - Microservices - What I've learned after a year building s...TechExeter
by Nathan Gloyn
This presentation covers what I've learned about using microserivices over the last year, the things you want to be doing and problems you can run into.
Presented at the 2018 TechExeter Conference https://techexeter.uk
Security for Position Navigation and Timing Systems
Guy Buesnel speaking at the TechExeter meetup August 2018
https://www.meetup.com/techexeter/events/249663175/
Why Isn't My Query Using an Index?: An Introduction to SQL Performance TechExeter
by Chris Saxon, Oracle.
“Why isn’t my query using an index?” is a common question people have when tuning SQL.
This talk explores the factors that influence the optimizer’s decision behind this question. It introduces the concepts of blocks and the clustering factor. It discusses how these affect the optimizer's calculations. It goes on to show how these concepts work in practice using real SQL queries.
This session is intended for developers who want to learn how the optimizer works and how to make their SQL run quickly!
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
by Dave Longman, Headforwards.
Modern software release cycles are getting shorter and shorter. Modern development languages and frameworks enable developers to produce new features faster than ever. With the trend of shorter sprints and a general move towards continuous delivery it is becoming more and more difficult to get everything ready to release without testing becoming a bottleneck.
Existing testing processes cannot keep up with the rapid release pace demanded by more and more companies. So what can we do about this? One approach is to turn your development team into testers, get them to think more like a tester thereby reducing the number of issues that get past the developers IDE. But does this work and how do you go about doing it?
In this session I will explain what we have done to help our developers become testers. I'll talk about the challenges we faced as well as the benefits that it brought for our projects. We'll also look at what impact this had on the developers and more crucially on the testers.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
by John Blackmore, Upad.
Remote working roles are on the rise, offering flexibility to employers and employees, opening up roles to workers that would normally not be available due to location or other factors. Based on real world experience over the last 18 months, I would like to share my tips and trips on working within and managing a remote team.
I show the pros and cons of remote working, great ways to set up your space for productive working, and how to avoid common procrastination pitfalls.
I have been working as a team lead for a fully remote team of developers and would like to share our story of how we organise work, communicate, and collaborate in ways that focussed and productive without the distractions of the modern open plan office.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
by Dermot Kilroy, GoCompare.
The Agile Manifesto captured the mindset of 17 software delivery thought leaders in how they wanted to deliver software. Since then the agile landscape has exploded with all sorts of different tools, techniques and practices.
In my experience the adoption of agile focuses heavily on implementing the processes, tools and techniques. But, true agility is achieved by the people within the organisation adopting the agile mindset.
This talk is all about the agile journey GoCompare has taken and, more importantly, contains an experience report of developing an agile mindset at all levels of the organisation.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
by Andy Wood, Ideaflip.
Writing software has been compared to many other professions such as science, engineering, architecture, craft and art. However, while these analogies can be useful, nearly all of them assume that the goal is a finished product. One that might require the odd bit of maintenance and occasional bit of redecoration perhaps, but fundamentally, a more or less static, completed artefact.
Today's networked software ecosystems are complex, dynamic environments. Security updates, changing cloud APIs, new web technologies and mobile operating systems, all contribute to a ever-evolving context that developers have to contend with while creating apps and services. We need a fresh analogy to draw inspiration from.
In this session I propose that writing software should be treated more like gardening and look at the ways this analogy can help when thinking—and perhaps more importantly, talking—about the design, development and maintenance of today's systems.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
The trials and tribulations of providing engineering infrastructure TechExeter
by Olly Stephens, ARM.
This talk is a reflection on the things I’ve learnt having spent the last 17 years (and counting) providing infrastructure to the engineering communities at ARM Ltd.
ARM engineering engages in a wide variety of engineering disciplines to produce, enable and support it’s products. This, in turn, creates varied demand on the internal infrastructure required to enable it. From large HPC clusters that have been used in pretty much the same way for 20+ years, through weird and wacky custom pieces of hardware, to the modern infrastructure required for efficient software development.
The talk will discuss some of the challenges of providing and evolving the internal infrastructure needed for ARM to function, and reflect on changes resulting from more recent enablers such as cloud computing and home working.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
3. Data Science Definition
“Data science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from data in various forms, either structured
or unstructured, which is a continuation of some of
the data analysis fields such as statistics, machine
learning, data mining, and predictive analytics”
https://en.wikipedia.org/wiki/Data_science
5. The Cloud
Why does the Cloud matter for Data Science?
High capacity and cost effective data storage
Flexible, elastic compute capacity
Ready to use technologies
Choice of Infrastructure or Platform
Enables Agile & DevOps
Operational reliability and security
Pay as you go
6. Microsoft Azure Cloud Platform
Wide range of services covering Compute, Web & Mobile, Data &
Storage, Analytics, Internet of Things & Intelligence plus many more,
see http://azureplatform.azurewebsites.net/en-us/
Easy to get started, free to try for 30 days but limited spend, also
MSDN licence free credits, see https://azure.microsoft.com/en-
gb/free/
Comprehensive documentation and examples
Global presence with many recognisable brands fully committed
Huge investment and growing rapidly
9. NYC taxis
2013 NYC taxi trips and fares – open but non-trivial dataset
24 CSV files - 12 trip, 12 fare, 1 for each month
~20GB compressed, ~50GB uncompressed, 170+ million records
medallion – vehicle identifier
hack license – driver identifier
passenger count
pickup & dropoff – datetime, longitude, latitude
trip – time and distance
fare - payment type, fare amount, surcharge, mta tax, tip amount, tolls
amount, total amount
http://www.andresmh.com/nyctaxitrips/
10. Predictions
Predict whether a specific journey will result in a tip – binary
classification
Predict what class of tip will be for a specific journey – multiclass
classification
Predict how much a tip will be for a specific journey – regression
12. Data Science Virtual Machine
Create Linux and Windows virtual machines in minutes
Wide range of configurations - CPU cores, memory, disks, network
speeds
Scale to what you need
Pay only for what you use
Enhance security and compliance
Preloaded with full set of tools and utilities from Azure MarketPlace
e.g. SQL Server 2016 Developer edition, Azure SDK, Python, R,
Jupyter, etc.
13. Storage Accounts
Massively scalable cloud storage for your applications
Security-enhanced, durable, and highly available across the globe
Industry-leading performance with exabytes of capacity
Pay only for what you use
Open, multi-platform support
14. HDInsight
A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service
made easy
Scale to petabytes on demand
Crunch all data—structured, semi-structured, unstructured
Skip buying and maintaining hardware
Spin up Apache Hadoop, Spark, and R clusters in the cloud
Use Excel or your favourite BI tool to visualize Hadoop data
Connect on-premises Hadoop clusters with the cloud
15. Azure Machine Learning
A fully managed cloud service that enables you to easily build, deploy,
and share predictive analytics solutions.
Powerful cloud based analytics, now part of Cortana Intelligence
Suite
Azure Machine Learning Studio includes hundreds of built-in
packages and support for custom code
Share your solution with the world in the Gallery or on the Azure
Marketplace
17. Preparation & Exploration
Copy data using Azcopy and decompress
Inspect files and load in to RStudio
Create external Hive tables and load
Query over full dataset for further exploration
Remove erroneous data e.g. passenger numbers, lat/long
Engineer features using Hive
Distance from start to finish using Haversine calculation
Binary indicator for tips
Tip level based on ranges for multiclass classification
Downsample dataset and save as internal table for Machine Learning
18. Machine Learning & Deployment
Import Data using Hive Query
Build Training Experiments
Evaluate model performance
Create Predictive Experiments
Publish Web Service
Test Web Service
Call from Excel
19. Next Steps
To build a fully fledged enterprise solution with regular data ingestion
and model execution consider the following:
Data Catalog
Data Factory
Event Hubs & Stream Analytics
Power BI
Cognitive Services
21. Summary
Microsoft Azure provides a wide range of technologies for Data
Science activities
Platform services reduce the management overhead
No capacity limitations and flexible provisioning – pay as you go
Choice of Open Source and Microsoft – use the best tool for the task
The tools are well integrated
Azure Machine Learning makes it trivial to deploy your models
It’s quick and easy to get started
22. Getting Started
Sign up for free
https://azure.microsoft.com/en-gb/free/
Create a Data Science VM
https://azure.microsoft.com/en-us/marketplace/partners/microsoft-
ads/standard-data-science-vm/
Visit Cortana Intelligence Gallery
https://gallery.cortanaintelligence.com/
24. Thank You
Martin Thornalley
Data Solution Architect, Microsoft
@mthornal
martin.thornalley@microsoft.com
https://www.linkedin.com/in/martinthornalley