ODSC UK 2016: How To Analyse Weather Data and Twitter Sentiment with Spark an...Margriet Groenendijk
These slides show you how to extract, combine, and analyze Twitter and Weather feeds in a Python notebook, using pyspark, Weather Company Data, Insights for Twitter.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. In these slides I will walk your through some examples of how to collect, store and access data in the Cloud with the use of different APIs.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. This talk shows some examples of how to collect, store and access data in the Cloud with the use of different APIs.
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
Originally presented at Collab Summit 2016, this talk covers the use of GHTorrent to gather and analyze public repo and community data from GitHub. We talk about using Azure Data Lake as well as how you can set up this infrastructure yourself.
Is it harder to find a taxi when it is raining? Wilfried Hoge
Using open data to answer the question if it is harder to find a taxi, when it is raining. Live demo of analyzing taxi data with DashDB, R, and Bluemix.
Presented on data2day conference.
Presented at Open Camps (Database Camp) in New York City on November 19, 2017. http://www.db.camp/2017/presentations/graph-computing-with-apache-tinkerpop
ODSC UK 2016: How To Analyse Weather Data and Twitter Sentiment with Spark an...Margriet Groenendijk
These slides show you how to extract, combine, and analyze Twitter and Weather feeds in a Python notebook, using pyspark, Weather Company Data, Insights for Twitter.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. In these slides I will walk your through some examples of how to collect, store and access data in the Cloud with the use of different APIs.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. This talk shows some examples of how to collect, store and access data in the Cloud with the use of different APIs.
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
Originally presented at Collab Summit 2016, this talk covers the use of GHTorrent to gather and analyze public repo and community data from GitHub. We talk about using Azure Data Lake as well as how you can set up this infrastructure yourself.
Is it harder to find a taxi when it is raining? Wilfried Hoge
Using open data to answer the question if it is harder to find a taxi, when it is raining. Live demo of analyzing taxi data with DashDB, R, and Bluemix.
Presented on data2day conference.
Presented at Open Camps (Database Camp) in New York City on November 19, 2017. http://www.db.camp/2017/presentations/graph-computing-with-apache-tinkerpop
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Codemotion
In a world of connected devices it is really important to be prepared receiving and managing a huge amount of messages. In this context what is making the real difference is the backend that has to be able to handle safely every request in real time. In this talk we will show how the broad spectrum of highly scalable services makes Google Cloud Platform the perfect habitat for such as workloads.
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
Presented by Ibrahim Muhammadi. Founder - AppWorx.cc
Big Data is revolutionizing how businesses make decisions now. More and more decisions and strategies are now based on data.
Bethesda Data Science Meetup February 2019
Chris Conlan and Paulo Martinez give a brief overview of the software ecosystem for web-based data viz, then dive into their own portfolios (not in slides).
Data analytics in its infancy has taken off with the development of SQL. Yet, at web-scale, even simple analytics queries can prove challenging within (Distributed-) Stream Processing environments. Two such examples are Count and Count Distinct. Because of the key-oriented nature of these queries, traditionally such queries would result in ever increasing memory demand. Through approximation techniques with fixed-size memory consumption, said tasks are feasible and potentially more resource efficient within streaming systems. This is demonstrated by integrating Yahoo Data Sketches on Apache Flink. The evaluation highlights the resource efficiency as well as the challenges of approximation techniques (e.g. varying accuracy) and potential for tuning depending on the dataset. Furthermore, challenges in integrating the components within the existing Streaming interfaces(e.g. Table API) and stateful processing are presented.
Discover what's new in the Neo4j community for the week of 13 January 2018, including projects around FOSDEM, Knowledge Graphs, and the Azure template.
Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy wit...Flink Forward
Data is in the very core how Rovio builds and operates its games. What does data mean for Rovio: how its processed and how we gain value from it? In this talk we take a deep dive into Rovio analytics pipeline and its use cases. We will give you a brief history lesson on how a purely batch based system has evolved into hybrid streaming and batch system, and share how we operate our production pipeline in AWS.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
IBM Watson overview presented by Mike Pointer, Watson Sr. Solution Architect, at Penn State's Nittany Watson Challenge Immersion event on January 19-20, 2017.
Process mining provides new ways to utilize the abundance of event data in our society. This emerging scientific discipline can be viewed as a bridge between data science and process science: It is both data-driven and process-centric. Process mining provides a novel set of tools to discover the real processes, to detect deviations from normative processes, and to analyze bottlenecks and waste. The Internet of Events (IoE) not only includes classical sources of information like the webpages, information systems, and social media, but also incorporates the Internet of Things (IoT), wearables, mobile devices and Industry 4.0. Analogous to spreadsheets, process mining provides a generic domain-independent technology (starting from events rather than numbers). In his talk, Wil van der Aalst will argue that process mining should be an integral part of tomorrow's data scientist. He will introduce basic concepts and elaborate on his collaboration with industry. His research group at TU/e applied process mining in over 150 organizations, developed the open-source tool ProM, and influenced the 25+ commercial process mining tools available today.
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Codemotion
In a world of connected devices it is really important to be prepared receiving and managing a huge amount of messages. In this context what is making the real difference is the backend that has to be able to handle safely every request in real time. In this talk we will show how the broad spectrum of highly scalable services makes Google Cloud Platform the perfect habitat for such as workloads.
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
Presented by Ibrahim Muhammadi. Founder - AppWorx.cc
Big Data is revolutionizing how businesses make decisions now. More and more decisions and strategies are now based on data.
Bethesda Data Science Meetup February 2019
Chris Conlan and Paulo Martinez give a brief overview of the software ecosystem for web-based data viz, then dive into their own portfolios (not in slides).
Data analytics in its infancy has taken off with the development of SQL. Yet, at web-scale, even simple analytics queries can prove challenging within (Distributed-) Stream Processing environments. Two such examples are Count and Count Distinct. Because of the key-oriented nature of these queries, traditionally such queries would result in ever increasing memory demand. Through approximation techniques with fixed-size memory consumption, said tasks are feasible and potentially more resource efficient within streaming systems. This is demonstrated by integrating Yahoo Data Sketches on Apache Flink. The evaluation highlights the resource efficiency as well as the challenges of approximation techniques (e.g. varying accuracy) and potential for tuning depending on the dataset. Furthermore, challenges in integrating the components within the existing Streaming interfaces(e.g. Table API) and stateful processing are presented.
Discover what's new in the Neo4j community for the week of 13 January 2018, including projects around FOSDEM, Knowledge Graphs, and the Azure template.
Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy wit...Flink Forward
Data is in the very core how Rovio builds and operates its games. What does data mean for Rovio: how its processed and how we gain value from it? In this talk we take a deep dive into Rovio analytics pipeline and its use cases. We will give you a brief history lesson on how a purely batch based system has evolved into hybrid streaming and batch system, and share how we operate our production pipeline in AWS.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
IBM Watson overview presented by Mike Pointer, Watson Sr. Solution Architect, at Penn State's Nittany Watson Challenge Immersion event on January 19-20, 2017.
Process mining provides new ways to utilize the abundance of event data in our society. This emerging scientific discipline can be viewed as a bridge between data science and process science: It is both data-driven and process-centric. Process mining provides a novel set of tools to discover the real processes, to detect deviations from normative processes, and to analyze bottlenecks and waste. The Internet of Events (IoE) not only includes classical sources of information like the webpages, information systems, and social media, but also incorporates the Internet of Things (IoT), wearables, mobile devices and Industry 4.0. Analogous to spreadsheets, process mining provides a generic domain-independent technology (starting from events rather than numbers). In his talk, Wil van der Aalst will argue that process mining should be an integral part of tomorrow's data scientist. He will introduce basic concepts and elaborate on his collaboration with industry. His research group at TU/e applied process mining in over 150 organizations, developed the open-source tool ProM, and influenced the 25+ commercial process mining tools available today.
성남산업진흥재단에서 추진하는 "혁신기업 클라우드 서비스 지원사업 설명회"에서 발표한 자료입니다. 요즘 화두가 되는 제4차 산업혁명의 개념을 소개하고 용어의 정의에 관한 논쟁을 소개했습니다. 중요한 것은 용어 자체가 아니라, 세상이 변화하는 방향임을 지적하고 그 방향이 디지털 트랜스포메이션(Digital Transformation)으로 대변된다고 했습니다. ICBM은 디지털 트랜스포메이션을 가능하게 하는 수단으로, 각각의 기술 즉 사물인터넷(IoT), 클라우드, 빅데이터, 모바일 분야의 시장 및 기업 동향을 간단하게 소개하고 이러한 기술을 활용하는 기업들이 사용할 수 있는 비즈니스 모델에 대해서 소개하면서 발표를 마쳤습니다.
Deloitte's report and point of view on IBM's Watson. IBM Watson, AI, Cognitive Computing are rapidly evolving technologies that can support and enhance enterprise solutions. Learn about IBM Watson the Why? and the How?
Beyond Keyword Search with IBM Watson Explorer Webinar DeckMC+A
IBM Watson Explorer provides flexible and powerful cognitive search and content analytics that can support a large variety of business use cases. In this Webinar, we discuss moving beyond keyword and federated search provided by products like the Google Search Appliance and getting ready for what’s next.
Discover what comes next for IBM Watson and the industries particularly suited for Watson solutions, such as healthcare, banking, and the financial sector. All of which deal with massive amounts of unstructured data coming from various sources. Find out how the advanced analytics used in Watson are being put to work in businesses around the world.
All Models Are Wrong, But Some Are Useful: 6 Lessons for Making Predictive An...Brian Mac Namee
Introduces some key ideas for deploying machine learning based predictive analytics models effectively. Based on the book "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked Examples & Case Studies" (www.machinelearningbook.com)
Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk shows how to visualise and combine data with Python notebooks in the Cloud. Examples include data from different sources such as weather and climate, statistics collected by individual countries and Open Street Map data.
Agile and continuous delivery – How IBM Watson Workspace is builtVincent Burckhardt
Journey and transformations that we have been taking at IBM to implement Cloud Native application. Covers culture, architecture and pipeline changes. This presentation was given at IBM Connect 2017 in San Francisco in Feb 2017.
This presentation provides demonstrations of Watson API Services utilized in various Big Data and Analytic applications and was presented at Penn State's Nittany Watson Challenge Immersion event on January 19-20, 2017.
IBM Watson Question-Answering System and Cognitive ComputingRakuten Group, Inc.
IBM's vision of cognitive computing has been steadily embraced across the industries since IBM's Watson question-answering system made a sensational debut at the US Jeopardy! television quiz show in 2011. As a core member of the Watson project, I would like to share the excitement of the project and the last five and a half year of its progress into the cognitive business. In this talk, I will also give a technical overview of Watson, major use cases, and perspectives on the future of cognitive computing.
https://tech.rakuten.co.jp/
Smart Industry 4.0: IBM Watson IoT in de praktijkIoT Academy
Tijdens de tweede IoT meetup van 2017 gaf Ronald Teijken inzicht hoe bedrijven slimmer complexe beslissingen kan nemen dankzij het Watson IoT Platform van IBM. Sensoren, Data, Analytics, Cognitive zijn enkele onderwerpen die hierbij aan bod kwamen.
IBM Watson Ecosystem roadshow - Chicago 4-2-14cheribergeron
IBM Watson is powering a new generation of cognitive applications. Learn how IBM is partnering with visionaries and entrepreneurs to bring innovative cognitive applications to market through the IBM Watson Ecosystem.
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured DataPerficient, Inc.
Healthcare organizations create a massive amount of digital data. Some is stored in structured fields within electronic medical records (EMR), claims or financial systems and is readily accessible with traditional analytics. Other information, such as physician notes, patient surveys, call center recordings and diagnosis reports is often saved in a free-form text format and is rarely used for analytics. In fact, experts suggest that up to 80% of enterprise data exists in this unstructured format, which means a majority of critical data isn’t being considered or analyzed!
Our webinar demonstrated how to extract insights from unstructured data to increase the accuracy of healthcare decisions with IBM Watson Content Analytics. Leveraging years of experience from hundreds of physicians, IBM has developed tools and healthcare accelerators that allow you to quickly gain insights from this “new” data source and correlate it with the structured data to provide a more complete picture.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析Simon Su
This is a short training for introduce Pi to use fluentd to collect data and use Google Cloud Logging and BigQuery as backend and then use Apps Script and Google Sheet as presentation layer.
Building Web Mobile App that don’t suck - FITC Web Unleashed - 2014-09-18Frédéric Harper
Mobility is everywhere. It’s even more important to think about smaller devices like smartphones when building application or game using web technology. Everybody wants to have a mobile application, but it's not sufficient: you need to create an awesome experience for your users, your customers. You need to be future proof, and think about different markets as users needs: you never know when your application will be the next Flappy Bird. This presentation is about concrete tips, tricks and guidelines for web developers using HTML, CSS, and JavaScript based on experience that will help you make a success of your next exciting mobile application or game idea.
https://social.samsunginter.net/web/statuses/101091908485239453# #Cdl2018 : #WebThing using #WebThingIotJs on #TizenRT on #ARTIK05x connected to @MozillaIot featuring @The_Jst #JerryScript + #IotJs , video to be published by @CapitoleDuLibre
webthing-iotjs-tizenrt-cdl2018-20181117rzr
From localhost to the cloud: A Journey of DeploymentsTegar Imansyah
28 September 2019. Event from HIMATIFA Universitas Pembangunan Nasional “Veteran” Jawa Timur called "IT Festival 2019".
In this talk, I present basic knowledge about how the web works, what is the different cloud computing with existing solution and how to deploy to server.
Version Control in Machine Learning + AI (Stanford)Anand Sampat
Starting with outlining the history of conventional version control before diving into explaining QoDs (Quantitative Oriented Developers) and the unique problems their ML systems pose from an operations perspective (MLOps). With the only status quo solutions being proprietary in-house pipelines (exclusive to Uber, Google, Facebook) and manual tracking/fragile "glue" code for everyone else.
Datmo works to solve this issue by empowering QoDs in two ways: making MLOps manageable and simple (rather than completely abstracted away) as well as reducing the amount of glue code so to ensure more robust end-to-end pipelines.
This goes through a simple example of using Datmo with an Iris classification dataset. Later workshops will expand to show how Datmo can work with other data pipelining tools.
This talk covers how to use PostgreSQL together with the Golang (Go) programming language. I will describe what drivers and tools are available and which to use nowadays.
In this talk I will cover what design choices of Go can help you to build robust programs. But also, we will reveal some parts of the language and drivers that can cause obstacles and what routines to apply to avoid risks.
We will try to build the simplest cross-platform application in Go fully covered by tests and ready for CI/CD using GitHub Actions as an example.
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
Learn about FaaS, PaaS architectural patterns that make use of Cloud Functions, Pub/Sub, Dataflow, Kubernetes and platforms that hides the management of servers from the user and have changed how we develop and deploy future software.
We discuss the difference between an event-driven approach - this means that you can trigger a function whenever something interesting happens within the cloud environment - and the simpler HTTP approach. Quota and pricing of per invocation, and the advantages and disadvantages of the serverless systems.
Build "Privacy by design" Webthings
With IoT.js on TizenRT and more
#MozFest, Privacy and Security track
Ravensbourne University, London UK <2018-10-27>
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB
What happens when you need to combine data from MongoDB along with other systems into a cohesive view for business intelligence? How do you extract, transform, and load MongoDB data into a centralized data warehouse? In this session we’ll talk about Google BigQuery, a managed, petabyte-scale data warehouse, and the various ways to get MongoDB data into it. We’ll cover managed options like Apache Beam and Cloud Dataflow as well as other tools that can help make moving and using MongoDB data easy for business intelligence workloads.
Similar to Introduction to the IBM Watson Data Platform (20)
Learn about how bias can take root in machine learning algorithms and ways to overcome it. From the power of open source, to tools built to detect and remove bias in machine learning models, there is a vibrant ecosystem of contributors who are working to build a digital future that is inclusive and fair. Learn how to achieve AI fairness, robustness and explainability. You can become part of the solution.
Trusting machines with robust, unbiased and reproducible AI Margriet Groenendijk
To trust a decision made by an algorithm, we need to know that it is reliable and fair, that it can be accounted for, and that it will cause no harm. We need assurance that it cannot be tampered with and that the system itself is secure. We need to understand the rationale behind the algorithmic assessment, recommendation or outcome, and be able to interact with it, probe it – even ask questions. And we need assurance that the values and norms of our societies are also reflected in those outcomes.
Learn about how bias can take root in machine learning algorithms and ways to overcome it. From the power of open source, to tools built to detect and remove bias in machine learning models, there is a vibrant ecosystem of contributors who are working to build a digital future that is inclusive and fair. Learn how to achieve AI fairness, robustness, explainability and accountability. You can become part of the solution.
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly. This session provides a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data - even if you aren't a meteorologist.
Data visualization is fun but can take up a lot of time, especially when you are exploring new data. The magic forest is much easier to navigate with PixieDust, a free open-source Python library that makes it quick and simple to explore data with any visualization library without writing code in a Jupyter notebook. Learn how PixieDust takes out some of the coding, how to contribute, and how to make and share visualizations in seconds.
http://www.callandcontactcentreexpo.co.uk/speakers/margriet-groenendijk/
In order to move past the hype and achieve the full potential of machine learning, data scientists and software developers need to work more closely together towards their common goal of delivering well-architected, data-driven applications. Every industry is in the process of being transformed by software and data. It is in the collaboration between data scientists and software developers where the real value can be found by creating integrated data workflows that benefit from the unique knowledge and skillsets of each discipline.
In order to move past the hype and achieve the full potential of machine learning, data scientists and software developers need to work more closely together towards their common goal of delivering well-architected, data-driven applications. Every industry is in the process of being transformed by software and data. It is in the collaboration between data scientists and software developers where the real value can be found by creating integrated data workflows that benefit from the unique knowledge and skillsets of each discipline.
https://www.dncexpo.be/seminar/O105
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly.
This session provides a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data – even if you aren’t a meteorologist. Learn how to connect weather data to other data sources, how to visualize weather and climate data in an interactive weather dashboard embedded in a Python notebook, and other ways you can use weather data for yourself, from examples using weather APIs, maps, PixieDust and Machine Learning.
Data Science deals with the extraction of valuable insights from an incredible number of sources in an endless number of formats. This session will go through a typical workflow using practical tools and tricks. This will give you a basic understanding of Data Science in the Cloud. The examples will show the steps that are needed to build and deploy a model to predict traffic collisions with weather data.
ODSC Europe: Weather and Climate Data: Not Just for MeteorologistsMargriet Groenendijk
Weather is part of our everyday lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, and what is it made of? The answer is a mix of measurements, models and statistics, meaning that the use of weather and climate data can get complex very quickly.
These slides provide a brief overview of the science behind weather and climate forecasts and provides you with the tools to get started with weather data - even if you aren't a meteorologist. Learn how to connect weather data to other data sources, how to visualize weather and climate data in an interactive weather dashboard embedded in a Python notebook, and other ways you can use weather data for yourself, from examples using weather APIs, maps, PixieDust and Machine Learning.
Data Science deals with the extraction of valuable insights from an incredible number of sources in an endless number of formats. This session will go through a typical workflow using practical tools and tricks. This will give you a basic understanding of Data Science in the Cloud. The examples will show the steps that are needed to build and deploy a model to predict traffic collisions with weather data.
Data Science Festival - Beginners Guide to Weather and Climate DataMargriet Groenendijk
Weather is part of our every day lives. Who doesn’t check the rain radar before heading out, or the weather forecast when planning a weekend away? But where does this data come from, what is it made of? The answer is a mix of measurements, models and statistics. This talk looks at the observations, predictions and forecast models, and weather data as a variable to consider in machine learning models. Learn how it is done and ways you can use weather and climate data from several examples.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
1. IBM Watson Data Platform
and Open Data
27 February 2017
Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform
@MargrietGr
https://medium.com/ibm-watson-data-lab
21. @MargrietGr
Cloudant is a database
id firstname lastname dob
1 John Smith 1970-01-01
2 Kate Jones 1971-12-25
{
"_id": "1",
"firstname": "John",
"lastname": "Smith",
"dob": "1970-01-01"
}
38. @MargrietGr
Open Street Map Data
IBM Cloudant Use from
anywhere!
Daily updates
VM
daily cron
Python script
Always up to date!
Currently 12,467,460 POIs
40. @MargrietGr
Extract the POIs with osmosis
osmosis --read-pbf netherlands-latest.osm.pbf
--tf accept-nodes
aerialway=station
aeroway=aerodrome,helipad,heliport
amenity=* craft=* emergency=*
highway=bus_stop,rest_area,services
historic=* leisure=* office=*
public_transport=stop_position,stop_area
shop=* tourism=*
--tf reject-ways --tf reject-relations
--write-xml netherlands.nodes.osm
(easy to install with brew on Mac)
41. @MargrietGr
Some cleaning up with osmconvert
Convert from osm to json format with ogr2ogr
osmconvert $netherlands.nodes.osm
--drop-ways --drop-author --drop-relations
--drop-versions >$netherlands.poi.osm
ogr2ogr -f GeoJSON $netherlands.poi.json
$netherlands.poi.osm points
42. @MargrietGr
Upload to Cloudant with couchimport
export COUCH_URL="https://
username:password@username.cloudant.com"
cat $netherlands.poi.json | couchimport
--db poi-$netherlands --type json --jsonpath "features.*"
https://github.com/glynnbird/couchimport
IBM Cloudant
51. @MargrietGr
3
1
2
posted:2016-08-01,2016-10-01
followers_count:3000 friends_count: 3000
(weather OR sun OR sunny OR rain OR hail
OR storm OR rainy OR drought OR flood OR
hurricane OR tornado OR cold OR snow OR
drizzle OR cloudy OR thunder OR lightning
OR wind OR windy OR heatwave)
REST API docs:
https://new-console.ng.bluemix.net/docs/
services/Twitter/
twitter_rest_apis.html#rest_apis
Search for tweets
4 Select table
Use an existing service
56. @MargrietGr
RDDs : Resilient Distributed Datasets
Data does not have to fit on a single machine
Data is separated into partitions
Creation of RDDs
Load an external dataset
Distribute a collection of objects
Transformations construct a new RDD from a previous one (lazy!)
Actions compute a result based on an RDD
62. Getting started
▪ Go to datascience.ibm.com and sign in with your Bluemix account when you have one, else
sign up for one at the top right of the screen
63. Create a project
▪ Create New project, click on the link in top of the screen
▪ Or go to the My Projects in the menu on the left of the screen and click Create New Project
here
64. Create a project
▪ Name the Project
▪ Choose a Spark Service
▪ Choose an Object Storage
▪ Click Create
67. Add a notebook
▪ Click add notebooks
▪ Pick your favourite:
▪ Python 2
▪ Scala
▪ R
▪ Choose Spark 1.6 or 2.0
▪ Click Create Notebook
68. Let’s write some code
▪ Click the pen icon to start adding code (edit mode)
▪ When collaborating only one person can edit, others can add comments to the notebook
when in view mode