O'Reilly Where 2.0 2011
As a result of cheap storage and computing power, society is measuring and storing increasing amounts of information.
It is now possible to efficiently crunch Petabytes of data with tools like Hadoop.
In this O'Reilly Where 2.0 tutorial, Pete Skomoroch, Sr. Data Scientist at LinkedIn, gives an overview of spatial analytics and how you can use tools like Hadoop, Python, and Mechanical Turk to process location data and derive insights about cities and people.
Topics:
* Data Science & Geo Analytics
* Useful Geo tools and Datasets
* Hadoop, Pig, and Big Data
* Cleaning Location Data with Mechanical Turk
* Spatial Tweet Analytics with Hadoop & Python
* Using Social Data to Understand Cities
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonDatabricks
Communicating information to each other is at the heart of the human experience. Data, and the analysis of it, often drives this communication in a business setting. This session aims to give you an understanding of how advances in Artificial Intelligence, specifically Natural Language Interaction (NLI), Natural Language Generation coupled with Deep Learning can create new and exciting opportunities for building analytical based chatbots.
We will talk about how to design and train an NLI system that map requests to deep learning pipelines to derive insights. You will also learn how to apply NLG templates to help facilitate improved understanding and interaction with the chat-bot.
Everyone is paying for fraud detection, but without enough technical knowledge, they don't realize the fraud detection doesn't work or is easily tricked by the bad guys. So what's worse is that the people paying for fraud detection have a false sense of security and take their eyes off of the obvious fraud that is still getting through.
Data Pitfalls - Brighton SEO - Katie Swann.pptxKatieSwann5
We always hope to see our digital PR campaigns take off, but sometimes they can take off in the worst way.
From inaccurate data to misleading headlines, there are plenty of ways that we can fall into data-related pitfalls.
I explore some of the most common data dangers and how to avoid them so your campaigns don’t end up on the wrong side of digital PR Twitter, keeping you and your clients happy!
Google Analytics for Beginners - TrainingRuben Vezzoli
I used this presentation for an internal training about Google Analytics and Web Analytics.
Google Analytics Training for Beginners.
Google Analytics Tutorial
Google Analytics for Dummies
Google Analytics Guide
FouAnalytics - site analytics and media analytics for practitioners to detect fraud and take action themselves - on-site tags and in-ad tags measure sites and ad impressions, respectively
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonDatabricks
Communicating information to each other is at the heart of the human experience. Data, and the analysis of it, often drives this communication in a business setting. This session aims to give you an understanding of how advances in Artificial Intelligence, specifically Natural Language Interaction (NLI), Natural Language Generation coupled with Deep Learning can create new and exciting opportunities for building analytical based chatbots.
We will talk about how to design and train an NLI system that map requests to deep learning pipelines to derive insights. You will also learn how to apply NLG templates to help facilitate improved understanding and interaction with the chat-bot.
Everyone is paying for fraud detection, but without enough technical knowledge, they don't realize the fraud detection doesn't work or is easily tricked by the bad guys. So what's worse is that the people paying for fraud detection have a false sense of security and take their eyes off of the obvious fraud that is still getting through.
Data Pitfalls - Brighton SEO - Katie Swann.pptxKatieSwann5
We always hope to see our digital PR campaigns take off, but sometimes they can take off in the worst way.
From inaccurate data to misleading headlines, there are plenty of ways that we can fall into data-related pitfalls.
I explore some of the most common data dangers and how to avoid them so your campaigns don’t end up on the wrong side of digital PR Twitter, keeping you and your clients happy!
Google Analytics for Beginners - TrainingRuben Vezzoli
I used this presentation for an internal training about Google Analytics and Web Analytics.
Google Analytics Training for Beginners.
Google Analytics Tutorial
Google Analytics for Dummies
Google Analytics Guide
FouAnalytics - site analytics and media analytics for practitioners to detect fraud and take action themselves - on-site tags and in-ad tags measure sites and ad impressions, respectively
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
FouAnalytics is an alternative to Google Analytics, but with fraud and bot detection baked in. Marketers can use FouAnalytics to look at their own campaigns, find the domains and apps that are eating up their budgets fraudulently, and turn them off, while the campaign is still running. How does that compare to your blackbox fraud detection that just gives you a percent IVT number?
Data science skills are increasingly important for research and industry projects. With complex data science projects, however, come complex needs for understanding and communicating analysis processes and results. The rise of data science has accompanied a comparable rise in business intelligence and the demand for visualizations and dashboards that can explain models, summarize results, assist with decision making, and even predict outcomes. Ultimately, an analyst’s data science toolbox is incomplete without visualization skills. This talk will explore the landscape of visualization for data science – using visualization for data exploration and communication, reproducible approaches to visualization, and how to develop better instincts for visualization choice and graphic design.
Companies are finding that data can be a powerful differentiator and are investing heavily in infrastructure, tools and personnel to ingest and curate raw data to be "analyzable". This process of data curation is called "Data Wrangling"
This task can be very cumbersome and requires trained personnel. However with the advances in open source and commercial tooling, this process has gotten a lot easier and the technical expertise required to do this effectively has dropped several notches.
In this tutorial, we will get a feel for what data wranglers do and use R, RStudio, Trifacta Wrangler, Open Refine tools with some hands-on exercises available at http://akuntamukkala.blogspot.com/2016/05/data-wrangling-examples.html
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Social Media Monitoring Tools - An Overview Stefan Betzold
This presentation provides an overview on free and paid tools for social media monitoring.
It was shown on the Social Media Barcamp in Berlin, June 2009.
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Identifying the basic purposes and scope of M&E. Describing the functions of an M&E plan. Identifying and understanding the main components of an M&E plan
FouAnalytics is an alternative to Google Analytics, but with fraud and bot detection baked in. Marketers can use FouAnalytics to look at their own campaigns, find the domains and apps that are eating up their budgets fraudulently, and turn them off, while the campaign is still running. How does that compare to your blackbox fraud detection that just gives you a percent IVT number?
Data science skills are increasingly important for research and industry projects. With complex data science projects, however, come complex needs for understanding and communicating analysis processes and results. The rise of data science has accompanied a comparable rise in business intelligence and the demand for visualizations and dashboards that can explain models, summarize results, assist with decision making, and even predict outcomes. Ultimately, an analyst’s data science toolbox is incomplete without visualization skills. This talk will explore the landscape of visualization for data science – using visualization for data exploration and communication, reproducible approaches to visualization, and how to develop better instincts for visualization choice and graphic design.
Companies are finding that data can be a powerful differentiator and are investing heavily in infrastructure, tools and personnel to ingest and curate raw data to be "analyzable". This process of data curation is called "Data Wrangling"
This task can be very cumbersome and requires trained personnel. However with the advances in open source and commercial tooling, this process has gotten a lot easier and the technical expertise required to do this effectively has dropped several notches.
In this tutorial, we will get a feel for what data wranglers do and use R, RStudio, Trifacta Wrangler, Open Refine tools with some hands-on exercises available at http://akuntamukkala.blogspot.com/2016/05/data-wrangling-examples.html
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Social Media Monitoring Tools - An Overview Stefan Betzold
This presentation provides an overview on free and paid tools for social media monitoring.
It was shown on the Social Media Barcamp in Berlin, June 2009.
SEO Case Study - Hangikredi.com From 12 March to 24 September Core UpdateKoray Tugberk GUBUR
Start Summary:
"131% Organic Session Increase in 5 Months
62% Impression Increase in 5 Months
144% Clicks Increase in 5 Months"
This SEO Case study is about Google Core Updates and their impacts on biggest financial institution website in Turkey.
I have started to work in Hangikredi.com at 26 March 2019. But, the company's website had been affected by 12 March Google Core Update very negatively.
I had started to work in here while a crisis had been happening.
I had examined the web site and figured it out that the real problems were crawl budget, authority signasl and relevancy-entity connection. I have activated social media, Google My Bussiness accounts, I have entered financial forums, every other alternative channel. I created a news publisher network about us. I cleaned the misleading status codes, HTML and CSS mistakes, optimised meta tags, fixed the redirection chains, I used the image compressions and deleted lots of unnecessary URL and their contents, I created the internal link structure from scratch.
Until 5 June Google Core Update, we were winners again.
We had regained all of our traffic lost. Until 1 August server atack, we were okay, then in one day, everything went wrong.
I had started from 0 again...
I had been optimising web site's offpage signals for regain the trust of Google AI and I had been supporting this strategy with onpage elements.
After 24th September Google Core Update, there was another success. We breaked the crawl load/rate record, avarage site position, CTR and impression, click records for site history.
In this CASE Study, you are gonna find details of a SEO Success Story with graphics and also some funny cencor images from my life.
End Summary:
"12 March, 5 june and 24 September Google Core Updates with 1 August Server Atack are the milestones of this SEO Casse Study. You will find all details from our view of point. I hope you will like it."
Identifying the basic purposes and scope of M&E. Describing the functions of an M&E plan. Identifying and understanding the main components of an M&E plan
Presentation held during the NKUA postgraduate course “DATA BASES MANAGEMENT SYSTEMS” on 6th of December 2016 at National and Kapodistrian University of Athens.
Nikolas Laskaris, UoA
Giota Koltsida, UoA
http://bit.ly/2hEkn3G
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...Health Catalyst
As far back as the 1840s, clinicians have been using maps to inform them about population health trends. Today, the geo-analytics industry is well-developed in almost every application, with the exception of healthcare and medicine. There is potential to use mapping technologies to show patient disease burden in geographic form, map locations of health care facilities, and a plethora of accountable care population health initiatives would benefit from geo-analysis. Health Catalyst is working to integrate inputs into analysis like maps that can show geographic care boundaries, population health demographics, and more.
Leveraging Geo-Spatial (Big) Data for Financial Services SolutionsCapgemini
For effective decision making, Big Data needs to be delivered at the right level of granularity at the right time. Capgemini’s FS BIM Innovation Practice, working through our Mastermind and Greenhouse processes to ensure a focus on real-world client issues, has developed a Reference Architecture (RA) based upon HP HAVEn to achieve these goals.
While Geo-Spatial Data has traditionally been applied to non-FS domains, effective application of this data has the potential to improve decision-making in FS, including in the areas of underwriting and pricing, claims, and bank and credit card fraud.
Presented at HP Discover Barcelona 2014 by:
Guillaume Runser - WW Solutions Marketing, HP
Ernest Martinez - Global Head - FS BIM Banking, Capgemini
Stephen Williams - BIM Innovation Practice Head, Capgemini
The logistics and supply industry are facing multiple challenges as the ‘world goes green’ and industry progressively reduces it’s waste by adopting material printing, programming and electronic distribution. For people, materials, components, and things NFC and LBS will be vital in reducing unnecessary handling, transport irregularities, and waste. Maximum returns on Reuse, Repurpose and Recycle can only be achieved through tagging, tracking and monitoring of use and movement. In this brief presentation we examine the possibilities and necessities and the relationship to the internet, The Cloud, and Big Data.
Cisco IBSG shares areas where Big Data will provide significant opportunities for Retailers over the next several years. This discussion will focus being in the store. For more info: http://cs.co/ibsg-bigdataretail
SXSW Keynote - The Game Layer On Top Of The WorldSeth Priebatsch
Seth Priebatsch's SXSW Keynote on the game layer on top of the world.
Seth Priebatsch is Chief Ninja of SCVNGR (www.scvngr.com) and their new pilot LevelUp (TheLevelUp.com)
An overview of traditional spatial analysis tools, an intro to hadoop and other tools for analyzing terabytes or more of data, and then a primer with examples on combining the two with data pulled from the Twitter streaming API. Given at the O'Reilly Where 2.0 conference in March 2010.
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
A talk on the use of Hadoop and Pig inside Twitter, focusing on the flexibility and simplicity of Pig, and the benefits of that for solving real-world big data problems.
Big Data Analytics with Hadoop with @techmilindEMC
Hadoop has rapidly emerged as the preferred solution for big data analytics across unstructured data and companies are seeking competitive advantage by finding effective ways of analyzing new sources of unstructured and machine-generated data. This session reviews the practices of performing analytics using unstructured data with Hadoop.
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units.[1]
A number of mathematical models have been developed for general concurrent computation including Petri nets, process calculi, the parallel random-access machine model, the actor model and the Reo Coordination Language.
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
Online learning, Vowpal Wabbit and Hadoop
Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.
The growth of the amount of medical image data produced on a daily basis in modern hospitals forces the adaptation of traditional medical image analysis and indexing approaches towards scalable solutions. In this work, MapReduce is used to speed up and make possible three large–scale medical image processing use–cases: (i) parameter optimization for lung texture classification using support vector machines (SVM), (ii) content–based medical image indexing, and (iii) three–dimensional directional wavelet analysis for solid texture classification.
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
This week’s CDx Connection Summit covers AI in the enterprise, providing practical, empirical and farsighted advise for those working on AI in large organizations from Pete Skomoroch and Tim O’Reilly.
Machine learning drove massive growth at consumer internet companies over the last decade, and this was enabled by open software, datasets, and AI research. For many problems, machine learning will produce better, faster, and more repeatable decisions at scale. Unfortunately, building and maintaining these systems is still extremely difficult and expensive. As more machine learning software moves to production, many of our traditional tools and best practices in software development will change.
Pete Skomoroch walks you through what you need to know as we shift from a world of deterministic programs to systems that give unpredictable results on ever-changing training data. To navigate this world powered by nondeterministic data-dependent programs, we’ll also need a new development stack to help us write, test, deploy, and monitor machine learning software.
Presented at OSCON Portland July 18, 2019
Companies that understand how to apply AI will scale and win their respective markets over the next decade. That said, delivering on this promise and managing machine learning projects is much harder than most people anticpate. Many organizations hire teams of PhDs and data scientists, then fail to ship products that move business metrics. The root cause is often a lack of product strategy for AI, or the failure to adapt their product development processes to the needs of machine learning systems. This talk will cover some of the common ways machine learning fails in practice, the tactical responsibilities of AI product managers, and how to approach product strategy for AI.
Peter Skomoroch, former Head of Data Products at Workday and LinkedIn, will describe how you can navigate these challenges to ship metric moving AI products that matter to your business.
Peter will provide practical advice on:
* The role of an AI Product Manager
* How to evaluate and prioritize your AI projects
* The ways AI product management differs from traditional product management
* Bridging the worlds of design and machine learning
* Making trade offs between data quality and other business metrics
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
Companies that understand how to apply machine intelligence will scale and win their respective markets over the next decade. That said, delivering on this promise is much harder than most executives realize. Without large amounts of labeled training data, solving most AI problems isn’t possible. The talent and leadership to bridge the worlds of product design, machine learning research, and user experience are scarce. Many organizations will tackle the wrong problems and fail to ship successful AI products that matter to their customers.
Pete Skomoroch explains how to navigate these challenges and build a business where every product interaction benefits from your investment in machine intelligence.
This talk was presented at the 2019 Strata Data Conference in London.
Topics include:
Who defines the data vision and roadmap in your organization?
Who is accountable for building and expanding your competitive moat?
Investing in foundational data infrastructure, training, logging, and tools
Fostering executive support for exploration and innovation, including user-facing data product and algorithm development
How to evaluate new machine intelligence projects and develop a portfolio that delivers
How AI product management differs from traditional product management
How to bridge the worlds of design and machine learning to get to product-market fit
Defining a framework for trading off investments in data quality, machine learning relevance, and other business objectives
Warren Buffet would often think of companies as castles with a competitive moat protecting the business. Products or companies that figure out how to build and leverage differentiated data assets will be best positioned to win their respective markets. This talk describes the properties of a good data moat, why it matters, and how to go about building them within your organization.
Talk from the first O'Reilly Strata, Feb 2011. Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata.
Includes a preview of some of the technology behind LinkedIn Skills, which I launched in a Keynote with DJ Patil the following day.
Video: http://blip.tv/oreilly-promos/distilling-data-exhaust-4780870
Examples, techniques, and lessons learned building data products over the last 4 years at LinkedIn.
Pete Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on building data products leveraging LinkedIn's powerful identity and reputation data.
The talk describes some techniques and best practices applied to develop products like LinkedIn Skills & Endorsements.
This talk was presented at the SF Data Science Meetup on September 19th, 2013
This keynote presentation describes the critical role that search and Lucene has in building next generation products that understand reputation and relevance. We also describe how data science and machine learning have been applied at LinkedIn to collect, interpret, and index data around topical reputation.
Lucene Revolution is the biggest open source conference dedicated to Apache Lucene/Solr.
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest “data features” in LinkedIn’s portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity.
In this talk, we’ll examine the practical aspects of building a data feature like Endorsements. We’ll talk about marrying product design and data, deep diving into several of the lessons we’ve learned along the way - all using skills & endorsements as an empirical case study. We’ll include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users.
Examples, techniques, and lessons learned building data products over the last 3 years at LinkedIn.
Pete Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on building data products leveraging LinkedIn's powerful identity and reputation data.
The talk describes some techniques and best practices applied to develop products like LinkedIn Skills & Endorsements.
This was the inaugural UberData Tech Talk, held in SF at Uber HQ.
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
Practical problem solving with data involves more than just visualization or applying the latest machine learning techniques. Intuition, domain knowledge, and reasonable approximations can mean the difference between a successful model and a catastrophic failure. Good problem solvers deeply analyze available data, improvise solutions using their unique assets, anticipate outcomes and issues, and adapt their techniques over time.
Practical problem solving with data involves more than just visualization or applying the latest machine learning techniques. Intuition, domain knowledge, and reasonable approximations can mean the difference between a successful model and a catastrophic failure. We’ll dive into some best practices I’ve extracted from solving real world problems like computing trending topics, cleaning election data, and ranking experts on social networks.
New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve problems with big data.
As large datasets come together exciting and unexpected things can happen. Human behavior is high dimensional, so combining many diverse datasets is critical to revealing actionable insights.
LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio.
To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way.
This LinkedIn talk at the NYC Hadoop Meetup held 3/18 at ContextWeb focused on best practices for quickly uncovering patterns, visualizing trends, and generating actionable insights from large datasets.
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
Hadoop World 2009 talk on rapid prototyping of data intensive web applications with Hadoop, Hive, Amazon EC2, Python, and Ruby on Rails. Describes the process of building the open source trend tracking site trendingtopics.org
Amazon EC2 may offer the possibility of high performance computing to programmers on a budget. Instead of building and maintaining a permanent Beowulf cluster, we can launch a cluster on-demand using Python and EC2. This talk will cover the basics involved in getting your own cluster running using Python, demonstrate how to run some large parallel computations using Python MPI wrappers, and show some initial results on cluster performance.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Geo Analytics Tutorial - Where 2.0 2011
1. Geo Analytics Tutorial
Pete Skomoroch
Sr. Data Scientist - LinkedIn (@peteskomoroch)
#geoanalytics
** Hadoop Intro slides from Kevin Weil, Twitter
2. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
3. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
13. Spatial Analysis
Map by Dr. John Snow of London,
showing clusters of cholera cases in
the 1854 Broad Street cholera
outbreak. This was one of the first
uses of map-based spatial analysis.
14. Spatial Analysis
• Spatial regression - estimate dependencies between variables
• Gravity models - estimate the flow of people, material, or
information between locations
• Spatial interpolation - estimate variables at unobserved locations
based on other measured values
• Simulation - use models and data to predict spatial phenomena
15. Life Span & Food by Zip Code
* http://zev.lacounty.gov/news/health/death-by-zip-code
* http://www.verysmallarray.com/?p=975
16. Where Americans Are Moving (IRS Data)
‣ (Jon Bruner) http://jebruner.com/2010/06/the-migration-map/
18. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
19. Useful Geo Tools
•R, Matlab, SciPy, Commercial Geo Software
•R Spatial Pkgs http://cran.r-project.org/web/views/Spatial.html
•Hadoop, Amazon EC2, Mechanical Turk
•Data Science Toolkit: http://www.datasciencetoolkit.org/
•80% of effort is often in cleaning and processing data
20. DataScienceToolkit.org
•Runs on VM or Amazon EC2
•Street Address to Coordinates
•Coordinates to Political Areas
•Geodict (text extraction)
•IP Address to Coordinates
•New UK release on Github
21. Resources for location data
• SimpleGeo
• Factual
• Geonames
• Infochimps
• Data.gov
• DataWrangling.com
22. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
23. Hadoop: Motivation
•We want to crunch 1TB of Twitter stream data and understand
spatial patterns in Tweets
•Data collected from the Twitter “Garden Hose” API last Spring
24. Data is Getting Big
‣ NYSE: 1 TB/day
‣ Facebook: 20+ TB
compressed/day
‣ CERN/LHC: 40 TB/day (15
PB/year!)
‣ And growth is accelerating
‣ Need multiple machines,
horizontal scalability
25. Hadoop
‣ Distributed file system (hard to store a PB)
‣ Fault-tolerant, handles replication, node failure, etc
‣ MapReduce-based parallel computation
(even harder to process a PB)
‣ Generic key-value based computation interface
allows for wide applicability
‣ Open source, top-level Apache project
‣ Scalable: Y! has a 4000-node cluster
‣ Powerful: sorted a TB of random integers in 62 seconds
26. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
27. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
28. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
29. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
30. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
31. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close to
2x faster.
32. MapReduce?
cat file | grep geo | sort | uniq -c > ‣ Challenge: how many tweets per
output county, given tweets table?
‣ Input: key=row, value=tweet info
‣ Map: output key=county, value=1
‣ Shuffle: sort by county
‣ Reduce: for each county, sum
‣ Output: county, tweet count
‣ With 2x machines, runs close
to 2x faster.
33. But...
‣ Analysis typically done in Java
‣ Single-input, two-stage data flow is rigid
‣ Projections, filters: custom code
‣ Joins: lengthy, error-prone
‣ n-stage jobs: Hard to manage
‣ Prototyping/exploration requires ‣ analytics in Eclipse?
compilation ur doin it wrong...
34. Enter Pig
‣ High level language
‣ Transformations on sets of records
‣ Process data one step at a time
‣ Easier than SQL?
35. Why Pig?
‣ Because I bet you can read the following script.
36. A Real Pig Script
‣ Now, just for fun... the same calculation in vanilla Hadoop MapReduce.
38. Pig Simplifies Analysis
‣ The Pig version is:
‣ 5% of the code, 5% of the time
‣ Within 50% of the execution time.
‣ Pig Geo:
‣ Programmable: fuzzy matching, custom filtering
‣ Easily link multiple datasets, regardless of size/structure
‣ Iterative, quick
39. A Real Example
‣ Fire up your Elastic MapReduce Cluster.
‣ ... or follow along at http://bit.ly/whereanalytics
‣ I used Twitter’s streaming API to store some tweets
‣ Simplest thing: group by location and count with Pig
‣ http://bit.ly/where20pig
‣ Here comes some code!
49. hadoop@ip-10-160-113-142:~$ hadoop dfs -cat /global_location_counts/part* | head -30
brasil 37985
indonesia 33777
brazil 22432
london 17294
usa 14564
são paulo 14238
new york 13420
tokyo 10967
singapore 10225
rio de janeiro 10135
los angeles 9934
california 9386
chicago 9155
uk 9095
jakarta 9086
germany 8741
canada 8201
7696
7121
jakarta, indonesia 6480
nyc 6456
new york, ny 6331
50. Neat, but...
‣ Wow, that data is messy!
‣ brasil, brazil at #1 and #3
‣ new york, nyc, and new york ny all in the top 30
‣ Mechanical Turk to the rescue...
51. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
65. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
66. Tokenizing and Cleaning Tweet Text
‣ Extract Tweet topics with Hadoop + Python + NLTK + Wikipedia
77. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
88. Topics
‣ Data Science & Geo Analytics
‣ Useful Geo tools and Datasets
‣ Hadoop, Pig, and Big Data
‣ Cleaning Location Data with Mechanical Turk
‣ Spatial Tweet Analytics with Hadoop & Python
‣ Using Social Data to Understand Cities
‣ Q&A
89. Questions? Follow me at
twitter.com/peteskomoroch
datawrangling.com