The aim of the project was to validate user-defined location data in a Twitter dataset of 10,000 tweets using MongoDB and the Google Maps Geocoding API.
A new approach for user identification in web usage mining preprocessingIOSR Journals
This document presents a new approach for user identification in web usage mining preprocessing. It proposes a three-phase method: 1) Select websites and access them from different locations to find the IP address, session usage time, and navigations. 2) Apply Java tools and methods to identify the IP address, session usage, and visited web links. 3) Combine the web link navigation, IP address, and session usage to efficiently investigate web user behavior. The key steps in preprocessing include data cleaning, IP address identification, session identification, data integration, transformation, reduction, and usage mining. The proposed approach aims to improve performance and data quality for identifying unique users and sessions.
This document outlines Project A which involves analyzing a Twitter dataset using tools on a virtual machine. Students are assigned to take a subset of Twitter user profile data, clean it, import it into a MongoDB database on their Jetstream VM, geolocate the user profiles, and visually display the results. The project aims to teach students how to set up a cloud VM, use MongoDB, and manipulate analysis tools. Students must complete the work individually and submit a report by October 31, 2016, discussing what they learned about setting up VMs, using MongoDB, and producing visualized results from complex data.
Tweets Classification using Naive Bayes and SVMTrilok Sharma
This document summarizes a project to automatically classify tweets into predefined Wikipedia categories. It discusses using three algorithms - Naive Bayes, SVM, and rule-based - to classify tweets into 11 categories like business, sports, politics etc. It explains the concepts used like removing outliers, stemming, spell checking. Accuracy results using 10-fold cross validation show SVM and rule-based achieving over 80% accuracy on most categories. The project analyzed real-time tweet data using an API and achieved high performance speeds for classification.
Presenting Data – An Alternative to the View ControlTeamstudio
In this webinar, Paul Della-Nebbia, an IBM Champion, will show how to implement a different alternative for displaying information from Domino views. Paul will cover how to use the Dojo Data Grid (included with XPages) to display a data grid that provides unique features like infinite scrolling, click to sort column headers, adjustable column widths, filtering, and the ability to drag and drop column headers to reorder. As the user scrolls through, the view data is retrieved as needed which improves performance and usability.
Uma SunilKumar has 10 years of experience working as a Tech Lead at Accenture. They have extensive experience with technologies like ASP.NET, WCF, SQL Server, HTML5, jQuery, JSON, and Bootstrap. They have worked on projects across various domains including plantations, insurance, resource management, and more.
VRE Cancer Imaging BL RIC Workshop 22032011djmichael156
The document discusses the Virtual Research Environment for Cancer Imaging (VRE-CI) project which aims to provide a framework for researchers and clinicians to share cancer imaging information, images, and algorithms. It describes using Business Connectivity Services and managed metadata to organize and search image metadata, and building a reusable SharePoint site definition to manage DICOM files and extract metadata for search. Key aspects covered include mapping folders, issues with document library names, including external code, and adapting the DICOM field model.
This document provides a synopsis of a six-week industrial training project called "Visualizer" that involved building a system to represent real-time data from IoT devices graphically on a website. The project involved transmitting sensor data wirelessly to a database server, processing the data, and simultaneously updating a real-time line graph. Key aspects included installing necessary software, dividing the large project into subtasks, creating a MySQL database, transmitting and acquiring the data, fetching values from the database to plot the dynamic graph, and implementing a Model-View-Controller structure for the front-end and back-end development. The project has various applications including medical breath analyzers and devices for agriculture.
A new approach for user identification in web usage mining preprocessingIOSR Journals
This document presents a new approach for user identification in web usage mining preprocessing. It proposes a three-phase method: 1) Select websites and access them from different locations to find the IP address, session usage time, and navigations. 2) Apply Java tools and methods to identify the IP address, session usage, and visited web links. 3) Combine the web link navigation, IP address, and session usage to efficiently investigate web user behavior. The key steps in preprocessing include data cleaning, IP address identification, session identification, data integration, transformation, reduction, and usage mining. The proposed approach aims to improve performance and data quality for identifying unique users and sessions.
This document outlines Project A which involves analyzing a Twitter dataset using tools on a virtual machine. Students are assigned to take a subset of Twitter user profile data, clean it, import it into a MongoDB database on their Jetstream VM, geolocate the user profiles, and visually display the results. The project aims to teach students how to set up a cloud VM, use MongoDB, and manipulate analysis tools. Students must complete the work individually and submit a report by October 31, 2016, discussing what they learned about setting up VMs, using MongoDB, and producing visualized results from complex data.
Tweets Classification using Naive Bayes and SVMTrilok Sharma
This document summarizes a project to automatically classify tweets into predefined Wikipedia categories. It discusses using three algorithms - Naive Bayes, SVM, and rule-based - to classify tweets into 11 categories like business, sports, politics etc. It explains the concepts used like removing outliers, stemming, spell checking. Accuracy results using 10-fold cross validation show SVM and rule-based achieving over 80% accuracy on most categories. The project analyzed real-time tweet data using an API and achieved high performance speeds for classification.
Presenting Data – An Alternative to the View ControlTeamstudio
In this webinar, Paul Della-Nebbia, an IBM Champion, will show how to implement a different alternative for displaying information from Domino views. Paul will cover how to use the Dojo Data Grid (included with XPages) to display a data grid that provides unique features like infinite scrolling, click to sort column headers, adjustable column widths, filtering, and the ability to drag and drop column headers to reorder. As the user scrolls through, the view data is retrieved as needed which improves performance and usability.
Uma SunilKumar has 10 years of experience working as a Tech Lead at Accenture. They have extensive experience with technologies like ASP.NET, WCF, SQL Server, HTML5, jQuery, JSON, and Bootstrap. They have worked on projects across various domains including plantations, insurance, resource management, and more.
VRE Cancer Imaging BL RIC Workshop 22032011djmichael156
The document discusses the Virtual Research Environment for Cancer Imaging (VRE-CI) project which aims to provide a framework for researchers and clinicians to share cancer imaging information, images, and algorithms. It describes using Business Connectivity Services and managed metadata to organize and search image metadata, and building a reusable SharePoint site definition to manage DICOM files and extract metadata for search. Key aspects covered include mapping folders, issues with document library names, including external code, and adapting the DICOM field model.
This document provides a synopsis of a six-week industrial training project called "Visualizer" that involved building a system to represent real-time data from IoT devices graphically on a website. The project involved transmitting sensor data wirelessly to a database server, processing the data, and simultaneously updating a real-time line graph. Key aspects included installing necessary software, dividing the large project into subtasks, creating a MySQL database, transmitting and acquiring the data, fetching values from the database to plot the dynamic graph, and implementing a Model-View-Controller structure for the front-end and back-end development. The project has various applications including medical breath analyzers and devices for agriculture.
This document provides an overview of a group project to develop a home management system using a Raspberry Pi. It includes sections on the application architecture, technologies used, design patterns, implementation details, and testing procedures. The system allows users to control devices connected to a Raspberry Pi from a mobile app or website. It uses IBM BlueMix for cloud services, including hosting the database and facilitating communication between the Pi and apps. Connecting the Android app to the online database presented some challenges that were overcome using PHP files on BlueMix.
Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch is a platform that allows developers to easily access MongoDB databases and integrate with key services. It provides native SDKs, integrated rules and functions to build scalable backends. Requests made through Stitch are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch handles authentication, authorization and access controls through user profiles and declarative rules. It is a unified solution for building complete applications that connect to MongoDB and external services securely.
Android MVVM architecture using Kotlin, Dagger2, LiveData, MediatorLiveDataWaheed Nazir
Kotlin MVVM Architecture:
A sample app that display list of Google news. The purpose of this project to illustrate the usage of MVVM architecture design pattern that follow the best practices of Object Oriented Design Patterns using the following technology stack.
Architecture Design Pattern
MVVM
Dagger2 (Dependency Injection)
Live Data, MediatorLiveData
Room Database
Retrofit
Unit Testing (Espresso), Mockito (Coming soon)
Repository Pattern
AndroidX
Glide
NetworkBoundResource, NetworkAndDBBoundResource
Google News API
JetPack Libraries
This dashboard analyzes trending videos on YouTube in India to understand factors that influence videos to appear in the trending section. The dashboard collects data from the YouTube API, cleans it, and stores it in a database. Visualizations including scatter plots, line plots, pie charts, bar charts and histograms are generated from the data to show trends like popular publishing hours, videos by day of the week, title formatting, top channels, and title lengths. The dashboard is deployed on Heroku so it is publicly available for creators to analyze trends and optimize their content.
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
Building 12-factor Cloud Native MicroservicesJakarta_EE
The document discusses the twelve-factor app methodology for building cloud-native microservices. It describes the twelve factors including codebase, dependencies, configuration, backing services, build/release/run stages, processes, port binding, concurrency, disposability, development/production parity, logs, and admin processes. It then demonstrates how to build a twelve-factor app using MicroProfile specifications and Kubernetes, with a live coding example of two microservices. References are provided for further reading.
Master a Cloud Native Standard - MicroProfile.pptxEmilyJiang23
Emily Jiang gave a presentation on MicroProfile, a set of lightweight, open source APIs for Java microservices. She began with an overview of MicroProfile's history and community-driven development process. She then provided a deep dive on various MicroProfile specifications, including Config, REST Client, OpenAPI, JWT Auth, Fault Tolerance, Health, Metrics, Telemetry, and more. Finally, she discussed the future of MicroProfile, including upcoming versions that will adopt OpenTelemetry Metrics and make other updates.
The document proposes developing a web-based ROS industrial pendant using existing ROS libraries to access topics and visualization from a web browser. The proposal outlines developing additional features including a complete integrated development environment (IDE) for ROS that is web-based. The IDE would allow users unfamiliar with ROS commands to easily create workspaces, packages, nodes, and connect to ROS nodes running on a server. This would make ROS platform independent and suitable for industrial applications. Key deliverables include auto-generating Python code skeletons, integrating node.js to access external hardware, providing robot modules for visualization, and designing a user interface to dynamically assign topics and data types.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
The document proposes a vision-based approach called ViDE (Vision-based Data Extractor) to extract structured data from deep web pages. ViDE explores the visual regularity of data records and items on web pages to identify and understand the visual structure without relying on the underlying programming language or HTML. It employs steps like identifying the visual structure, extracting data records, and partitioning records into data items. The approach is implemented in a tool to help researchers find documents related to authors in their research area by developing modules like a crawler, parser, clusterer and page manager.
The document proposes a vision-based approach called ViDE (Vision-based Data Extractor) to extract structured data from deep web pages. ViDE explores the visual regularity of data records and items on web pages to identify and understand the visual structure without relying on the underlying programming language or HTML. It employs steps like identifying the visual structure, extracting data records, and partitioning records into data items. The approach is implemented in a tool to help researchers find documents related to authors in their research area by developing modules like a crawler, parser, clusterer and page manager.
Martin Koons is a senior .NET developer and software architect with over 20 years of experience developing applications using technologies like C#, ASP.NET, SQL Server, and Entity Framework. He has extensive experience designing and developing n-tier architectures, distributed systems, and mobile applications. Currently, he works as a senior .NET developer at UPS where he maintains package delivery systems and develops utility programs using C# and databases.
This document provides an overview of using Google App Engine to develop a file repository application. It first discusses cloud computing and Google App Engine, including its architecture, key concepts like Bigtable distributed storage and the datastore. It then describes building a file repository app with functions like upload, download and file listing. The app is implemented using Java servlets, JSP, Apache Commons FileUpload and Google APIs.
Speaker: Drew DiPalma
Come learn more about MongoDB Stitch – Our new Backend as a Service (BaaS) that makes it easy for developers to create and launch applications across mobile and web platforms. Stitch provides a REST API on top of MongoDB with read, write, and validation rules built-in and full integration with the services you love. This talk will cover the what, why, and how of MongoDB Stitch. We’ll discuss everything from features to the architecture. You’ll walk away knowing how Stitch can kickstart your new project or take your existing application to the next level.
What You Will Learn:
The basics of MongoDB Stitch, its architecture, and features
How to use Stitch to kickstart new projects or build on top of existing projects.
How to integrate your favourite services with your MongoDB application.
The document discusses using Google App Engine and Google Web Toolkit to develop a simple stock market analysis program. It provides an overview of cloud computing and the key aspects of Google App Engine, including its architecture, data storage via Bigtable, and development process. It then describes how the stock analysis program was built with GWT to allow users to search for stock quotes, view their portfolio, and remove stocks, all while taking advantage of Google's cloud infrastructure. Code snippets demonstrate integrating GWT with App Engine for user login/logout and accessing data from the cloud.
- Vinay Mittal is an IT professional with over 10 years of experience in C++ development. He currently works as a Computer Scientist at Adobe India.
- His skills include C/C++, Perl, Unix shell scripting, Javascript, AWS services, SQL databases, version control systems, and UNIX/Linux systems.
- Previous experience includes developing multi-threaded C++ applications at RBS and security applications at CA. At Amazon he worked on product ads and billing systems.
- Education includes a Masters in Computer Science from IIT Roorkee with honors.
MoSKito als leistungsfähige Open-Source Alternative zu Applikation Management Systemen wie NewRelic oder AppDynamic - Slides - http://www.solutionscamp.de/session-detail/?3-moskito-als-leistungsfaehige-open-source-alternative-zu-applikation-management-systemen-wie-newrelic-oder-appdynamic
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...James Nelson
The aim of our project is to also utilize the GloBI APIs to visualize understudied organisms and locations with minimal interaction data within the GloBI data repository.
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
The document describes a machine learning project that compares the performance of R packages for logistic regression and random forest algorithms on wine quality datasets. It loads and prepares the datasets, then explores the data through descriptive statistics. Logistic regression and random forest models are applied to the training data and evaluated on test data.
This document provides an overview of a group project to develop a home management system using a Raspberry Pi. It includes sections on the application architecture, technologies used, design patterns, implementation details, and testing procedures. The system allows users to control devices connected to a Raspberry Pi from a mobile app or website. It uses IBM BlueMix for cloud services, including hosting the database and facilitating communication between the Pi and apps. Connecting the Android app to the online database presented some challenges that were overcome using PHP files on BlueMix.
Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch is a platform that allows developers to easily access MongoDB databases and integrate with key services. It provides native SDKs, integrated rules and functions to build scalable backends. Requests made through Stitch are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch handles authentication, authorization and access controls through user profiles and declarative rules. It is a unified solution for building complete applications that connect to MongoDB and external services securely.
Android MVVM architecture using Kotlin, Dagger2, LiveData, MediatorLiveDataWaheed Nazir
Kotlin MVVM Architecture:
A sample app that display list of Google news. The purpose of this project to illustrate the usage of MVVM architecture design pattern that follow the best practices of Object Oriented Design Patterns using the following technology stack.
Architecture Design Pattern
MVVM
Dagger2 (Dependency Injection)
Live Data, MediatorLiveData
Room Database
Retrofit
Unit Testing (Espresso), Mockito (Coming soon)
Repository Pattern
AndroidX
Glide
NetworkBoundResource, NetworkAndDBBoundResource
Google News API
JetPack Libraries
This dashboard analyzes trending videos on YouTube in India to understand factors that influence videos to appear in the trending section. The dashboard collects data from the YouTube API, cleans it, and stores it in a database. Visualizations including scatter plots, line plots, pie charts, bar charts and histograms are generated from the data to show trends like popular publishing hours, videos by day of the week, title formatting, top channels, and title lengths. The dashboard is deployed on Heroku so it is publicly available for creators to analyze trends and optimize their content.
As part of the final BETTER Hackathon, project partners prepared 4 hackathon exercises. Fraunhofer IAIS organised this exercise in conjunction with external partner MKLab ITI-CERTH (EOPEN project). This step-by-step exercise featured the setup of local Docker images on Linux OS featuring Dcoker Compose and (pre-installed) Python, SANSA, Hadoop, Apache Spark and Apache Zeppelin. It featured semantic transformation and and the use of SANSA (Scalable Semantic Analytics Stack - http://sansa-stack.net/) libraries on a sample of tweets ahead of geo-clustering.
Project website (Hackathon information): https://www.ec-better.eu/pages/2nd-hackathon
Github repository: https://github.com/ec-better/hackathon-2020-semanticgeoclustering
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
Building 12-factor Cloud Native MicroservicesJakarta_EE
The document discusses the twelve-factor app methodology for building cloud-native microservices. It describes the twelve factors including codebase, dependencies, configuration, backing services, build/release/run stages, processes, port binding, concurrency, disposability, development/production parity, logs, and admin processes. It then demonstrates how to build a twelve-factor app using MicroProfile specifications and Kubernetes, with a live coding example of two microservices. References are provided for further reading.
Master a Cloud Native Standard - MicroProfile.pptxEmilyJiang23
Emily Jiang gave a presentation on MicroProfile, a set of lightweight, open source APIs for Java microservices. She began with an overview of MicroProfile's history and community-driven development process. She then provided a deep dive on various MicroProfile specifications, including Config, REST Client, OpenAPI, JWT Auth, Fault Tolerance, Health, Metrics, Telemetry, and more. Finally, she discussed the future of MicroProfile, including upcoming versions that will adopt OpenTelemetry Metrics and make other updates.
The document proposes developing a web-based ROS industrial pendant using existing ROS libraries to access topics and visualization from a web browser. The proposal outlines developing additional features including a complete integrated development environment (IDE) for ROS that is web-based. The IDE would allow users unfamiliar with ROS commands to easily create workspaces, packages, nodes, and connect to ROS nodes running on a server. This would make ROS platform independent and suitable for industrial applications. Key deliverables include auto-generating Python code skeletons, integrating node.js to access external hardware, providing robot modules for visualization, and designing a user interface to dynamically assign topics and data types.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
The document proposes a vision-based approach called ViDE (Vision-based Data Extractor) to extract structured data from deep web pages. ViDE explores the visual regularity of data records and items on web pages to identify and understand the visual structure without relying on the underlying programming language or HTML. It employs steps like identifying the visual structure, extracting data records, and partitioning records into data items. The approach is implemented in a tool to help researchers find documents related to authors in their research area by developing modules like a crawler, parser, clusterer and page manager.
The document proposes a vision-based approach called ViDE (Vision-based Data Extractor) to extract structured data from deep web pages. ViDE explores the visual regularity of data records and items on web pages to identify and understand the visual structure without relying on the underlying programming language or HTML. It employs steps like identifying the visual structure, extracting data records, and partitioning records into data items. The approach is implemented in a tool to help researchers find documents related to authors in their research area by developing modules like a crawler, parser, clusterer and page manager.
Martin Koons is a senior .NET developer and software architect with over 20 years of experience developing applications using technologies like C#, ASP.NET, SQL Server, and Entity Framework. He has extensive experience designing and developing n-tier architectures, distributed systems, and mobile applications. Currently, he works as a senior .NET developer at UPS where he maintains package delivery systems and develops utility programs using C# and databases.
This document provides an overview of using Google App Engine to develop a file repository application. It first discusses cloud computing and Google App Engine, including its architecture, key concepts like Bigtable distributed storage and the datastore. It then describes building a file repository app with functions like upload, download and file listing. The app is implemented using Java servlets, JSP, Apache Commons FileUpload and Google APIs.
Speaker: Drew DiPalma
Come learn more about MongoDB Stitch – Our new Backend as a Service (BaaS) that makes it easy for developers to create and launch applications across mobile and web platforms. Stitch provides a REST API on top of MongoDB with read, write, and validation rules built-in and full integration with the services you love. This talk will cover the what, why, and how of MongoDB Stitch. We’ll discuss everything from features to the architecture. You’ll walk away knowing how Stitch can kickstart your new project or take your existing application to the next level.
What You Will Learn:
The basics of MongoDB Stitch, its architecture, and features
How to use Stitch to kickstart new projects or build on top of existing projects.
How to integrate your favourite services with your MongoDB application.
The document discusses using Google App Engine and Google Web Toolkit to develop a simple stock market analysis program. It provides an overview of cloud computing and the key aspects of Google App Engine, including its architecture, data storage via Bigtable, and development process. It then describes how the stock analysis program was built with GWT to allow users to search for stock quotes, view their portfolio, and remove stocks, all while taking advantage of Google's cloud infrastructure. Code snippets demonstrate integrating GWT with App Engine for user login/logout and accessing data from the cloud.
- Vinay Mittal is an IT professional with over 10 years of experience in C++ development. He currently works as a Computer Scientist at Adobe India.
- His skills include C/C++, Perl, Unix shell scripting, Javascript, AWS services, SQL databases, version control systems, and UNIX/Linux systems.
- Previous experience includes developing multi-threaded C++ applications at RBS and security applications at CA. At Amazon he worked on product ads and billing systems.
- Education includes a Masters in Computer Science from IIT Roorkee with honors.
MoSKito als leistungsfähige Open-Source Alternative zu Applikation Management Systemen wie NewRelic oder AppDynamic - Slides - http://www.solutionscamp.de/session-detail/?3-moskito-als-leistungsfaehige-open-source-alternative-zu-applikation-management-systemen-wie-newrelic-oder-appdynamic
Similar to Twitter Dataset Analysis and Geocoding (20)
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...James Nelson
The aim of our project is to also utilize the GloBI APIs to visualize understudied organisms and locations with minimal interaction data within the GloBI data repository.
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
The document describes a machine learning project that compares the performance of R packages for logistic regression and random forest algorithms on wine quality datasets. It loads and prepares the datasets, then explores the data through descriptive statistics. Logistic regression and random forest models are applied to the training data and evaluated on test data.
James Nelson has over 15 years of experience designing and leading laboratory, translational, and clinical research studies. He has a PhD in Molecular Biology and Genetics from Wayne State University and is currently enrolled in Indiana University's Data Science Master's Program. Nelson has extensive experience in areas such as statistical analysis, machine learning, big data technologies, bioinformatics, and data visualization. He has authored over 70 peer-reviewed publications and has been the recipient of over $7 million in NIH research grants.
This proposal outlines the commercialization pathway for an investigational in vitro diagnostic (IVD) device for nonalcoholic fatty liver disease (NAFLD). They were unable to identify a substantially equivalent predicate device, so they plan to submit a formal pre-submission to the FDA to obtain guidance on the appropriate regulatory pathway. The proposed studies funded by this proposal would support information needed for the pre-submission, including analytical validation and performance characteristics of the test. Depending on FDA feedback, the pathway may involve de novo classification, reclassification, or premarket approval.
1) A study examined the effects of a high-fat diet and parenteral iron administration on non-alcoholic fatty liver disease (NAFLD) in an obese, diabetic mouse model. 2) Mice fed a high-fat diet and administered parenteral iron showed increased liver inflammation, oxidative stress, and collagen production compared to mice on only a high-fat diet or normal diet. 3) However, mice given both a high-fat diet and parenteral iron showed less fat accumulation in the liver (steatosis) than mice on only a high-fat diet.
This document summarizes a study that will compare the effects of omega-3 polyunsaturated fatty acid supplementation to monounsaturated fatty acid supplementation for 8 weeks on nonalcoholic fatty liver disease (NAFLD). It will randomize 30 patients with NAFLD and at least 20% steatosis into the two treatment groups. The primary outcome is reduction of intrahepatic fat content as measured by magnetic resonance spectroscopy. Secondary outcomes include changes in liver enzymes, lipid profile, inflammation markers, and insulin resistance. The study personnel, design, population, visit schedule, and treatment protocols are outlined.
A Randomized, Masked, Controlled Study of Omega-3 Polyunsaturated Fatty Acid ...James Nelson
The aim of this study is to investigate the effects of an 8-week dietary supplementation with omega-3 polyunsaturated fatty acids (PUFA; i.e., fish oil) compared to monounsaturated fatty acids (MUFA; i.e., safflower oil) on intrahepatic fat content measured by magnetic resonance spectroscopy, serum aminotransferases, fasting lipids, insulin resistance, resting metabolic rate and proinflammatory cytokines in patients with non-alcoholic fatty liver disease.
Variants In The Il6 And Il1β Genes Either Alone Or In Combination With C282Y ...James Nelson
The goal of this study was to investigate if IL6 and IL1β cytokine SNPs, alone or in combination with HFE gene mutations, can affect the grade and pattern of hepatic iron deposition and serum iron markers in the well characterized NASH CRN cohort.
Serum Vitamin D Deficiency is Associated with NASH in AdultsJames Nelson
The aim of this study was to determine the relationship of serum vitamin D levels to histologic features of NAFLD, and associated demographic, clinical, and laboratory data in the well characterized NASH CRN cohort.
Deep Sequencing Identifies Novel Circulating and Hepatic ncRNA Profiles in NA...James Nelson
Next-generation RNA sequencing has expedited the identification of new non-coding RNA species (ncRNAs), thus ushering in the emerging field of ncRNA biology. The goals of this study were to catalogue the spectrum of different ncRNAs in serum and liver of patients with NAFLD and to compare expression of serum exRNAs between NAFLD patients and healthy control subjects.
Serum microRNA biomarkers for prognosis of nonalcoholic fatty liver diseaseJames Nelson
Next- generation sequencing (NGS) was performed on 45 serum RNA samples using the Illumina HiScanSQ platform. The goal of this study was to determine serum miRNA profiles for use as novel diagnostic and prognostic biomarkers for the presence of NAFLD, NASH and advanced fibrosis.
This curriculum vitae summarizes the education and experience of James E. Nelson. He received a PhD in Molecular Biology and Genetics from Wayne State University in 1994. Since then, he has held several research and staff positions, primarily focused on nonalcoholic steatohepatitis (NASH). He has received over $5 million in grant funding and authored over 50 publications. He has also designed and conducted numerous clinical studies on NASH through the NASH Clinical Research Network.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
2. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
2
1 Introduction
1.1 Project overview
This project was performed as part of the required curriculum for the class Management,
Access, and Use of Big and Complex Data (FA15-BL-INFO-I590-34717), Data Science Online
Master’s Program, Indiana University School of Informatics and Computing (IU-SoIC) (1). The
aim of the project was to validate user-defined location data in a Twitter dataset of 10,000
tweets (2) using MongoDB and the Google Maps Geocoding API (3). Subsequently, a subset of
the validated data was visualized by plotting the validated tweet origination locations using
Google Maps and associated Google Maps APIs (4). The dataset was a subset of a public
Twitter dataset of 3 million user profiles collected during May 2011 created by Li et al from the
University of Illinois (2). To complete the project, access to MongoDB on the IU-SoIC server
was provided along with a software bundle containing scripts and code in order to reformat,
import, query, update and visualize the dataset.
1.2 Learning objectives
A. Learn through hands on experience how to handle data and take it through typical big data
processing steps: data storing, cleansing, query and visualization using a Linux command
line interface.
B. Set up VirtualBox and download a prepared virtual machine onto a local computer.
C. Build a software bundle that has a set of tools, in the form of scripts, on the virtual machine.
D. Import, query and modify Twitter data in the NoSQL MongoDB database environment.
E. Validate, geocode and visualize tweet origination data using Google Maps APIs.
2 Methods
2.1 System overview
This project utilized a virtual machine (VM) environment from the IU-SoIC server containing the
Ubuntu operating system (Linux), MongoDB and all necessary software. The latest version of
the VirtualBox platform was installed locally to host the VM image (5). To initialize the VM image
from the IU-SoIC server, the file I590FALL2015.ova was downloaded from the IU Box
account and imported into the local VirtualBox console.
2.2 Software tools
To build the project java code package, a tarball (I590-TwitterDataSet.tar.gz)was
downloaded from the class website then extracted within the project base directory ./I590-
TwitterDataSet.Shown below is the directory tree structure description for the project.
3. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
3
Before building and deploy the code, the project base and executable file directories were
manually set in the configuration file build.properties (see underlined).
The project code was then compiled and deployed using the command: ”ant”. The tree
structure and description of the source code is given below.
2.3 Data reformatting and importation into MongoDB
The Twitter dataset file users_10000.txt needed to be reformatted from ISO-8859-1 to the
UTF-8 format that MongoDB accepts. This was accomplished by running the following script to
create the reformatted dataset revised_users.txt as follows:
I590-TwitterDataSet
├── bin (Contains scripts (executables); generated after the code deployment)
├── build (This is a build directory, generated during the code compile time)
│ ├── classes (.Class files that generated by java compiler)
│ │ ├── google
│ │ ├── mongodb
│ │ └── util
│ └── lib (contains core jar file for scripts in bin)
├── config (contains a configuration file: config.properties)
├── data (empty directory, put your data here)
├── input (contains a query criteria file, query.json, that needed for finding
│ and updating the documents in MongoDB)
├── lib (third party dependency library jars)
├── log (empty directory, put your log files here)
├── src (source code)
│ ├── google
│ ├── mongodb
│ └── util
└── templates (template files, and deploy script. The deploy script generates
platform-dependent scripts and output them to bin during code deployment)
# $Id: build.properties
# @author: Yuan Luo
# Configuration properties for building I590-TwitterProjectCode
project.base.dir=/home/mongodb/Projects/I590-TwitterProjectCode
java.home=/usr/bin
src
├── google
└── GeoCodingClient.java (return geocoding results from Google)
├── mongodb
│ ├── Config.java (extract parameters from the configuration file)
│ └── MongoDBOperations.java (select documents that satisfy a given query criteria
│ and update the documents by add geocode.)
└── util (utility classes)
├── Base64.java (encoder/decoder)
├── PropertyReader.java (help class for reading .properties files)
└── UrlSigner.java (OAuth 2.0 help class)
4. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
4
$ ./bin/reformat.sh users_10000.txt revised_users.txt
Next, the “headerline” below was added to the revised_users.txt file and the headers were
separated by tabs to create a tab separated “tsv” file.
user_id user_name friend_count follower_count status_count
favorite_count account_age user_location
The reformatted tab-separated dataset file revised_users.txt was then imported into
MongoDB using the script import_mangodb.sh. The MongoDB <db name> and
<collection name> are twitterdb and users, respectively. The <import file type>
is tsv. The command is:
$ ./bin/import_mangodb.sh twitterdb users tsv revised_users.txt
2.4 Data validation and geocoding
To perform the geocoding of the Twitter dataset (2) using the Google Geocoding API (3), the
user-defined string in the “user_location” dataset field needed to be verified. This process
was performed by using the QueryAndUpdate.sh script tool which invokes a novel java code
in the file GeoCodingClient.java. Briefly, this code queries and updates each document
that has a valid “user_location” recognized by the Google Geocoding API by performing the
following functions:
1) reformats the location removing whitespaces,
2) inserts the Geocoding URL: https://maps.googleapis.com/maps/api/geocode/json
3) inserts the extracted geoCode, containing the new fields “geocode”
"formatted_address" and "location" (containing the latitude and longitude)
4) reports Geocoding Status as: "OK".
Documents without a valid “user_location” returns only "geocode": null, without any
reformatting or geocoding. The following Linux shell command also requires a
<configuration file> and a JSON <query criteria file> as well as defining the
database and collection to be used.
$ ./bin/QueryAndUpdate.sh ./config/config.properties twitterdb users
./input/query.json ./log/query.log
For simplicity, Google API access was obtained as an “Anonymous user” circumventing other
detailed authentication options (6). However, since geocoding queries are limited to 2,500 per
day for Anonymous users, four days were needed to perform geocoding of all 10,000 user
profiles in the Twitter dataset.
2.5 Manual updating
5. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
5
Initially, in the first geocoding run a “500 error” message was returned. The solution to this error
was found on the class discussion board (7; Zong Peng 10/3/15). The word "break" was
replaced with "continue" in line 133 in the java code file MongoDBOperations.java. The
project code was recompiled and deployed using the command: ”ant”
To track the performance of the geocoding the following command was used in a new Linux
shell (7; Micheal Haley 10/3/15):
$ tail -f ./Projects/I590-TwitterProjectCode/log/twitterdblog.txt
Following the final run on the 4th
day the following message was returned:
In this run:
Total:1268 record(s) found.
Total:0 record(s) processed.
Total:0 record(s) updated.
To process the remaining records, “break” was returned to line 133 in the java code file
MongoDBOperations.java. Subsequently, the following 2 documents were manually
updated to remove the “user_location” string as shown (8, 9):
./"user_id" : 117246212 , …. , "user_location" : "The DMV (and no, not that DMV)"}
./"user_id" : 122836991 ……., "user_location" : "Inyomailboxbiatch...huh
3 Results and Discussion
3.1 Querying MongoDB
To determine the success of the geocoding, the following queries were used within the mongo
shell (7;Lawson/Eicher 10/2/2015, 10-12)
A)
B)
C)
6. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
6
D)
Thus, all of the 10,000 Twitter user profiles in the dataset were processed in our approach (A).
A total of 6,346 profiles were successfully geocoded (B), while 3,654 did not contain a valid
location recognized by the Google Geocoding API (C). Of these 3,654, a total of 1,392 tweets
did not have any user-defined value in the "user_location" field. Presumably the remaining
2,262 tweets contained nonsensical values in the "user_location" field.
3.2 Strategies to improve geocoding and query performance
A portion of the remaining 2,262 tweets contained GPS coordinates from either an Iphone (33
tweets) or Über twitter client (505 tweets) in the “user_location” field (12, 13). To determine
the exact number of these tweets, the following query was performed (11, 14).
It would be possible to manually curate these profiles such that they would be recognized by the
Google Geocoding API, however certainly an algorithm could be written to perform this function
(15).
There are several ways to improve performance of the query. The most obvious is to complete
the Google authentication process to increase the numbers of queries per day in order to
reduce the overall query run time (6). Another way to increase query efficiency is create an
index. A single field text index using the field “user_location” would allow you to omit
scanning user profiles with missing values in this field thus avoiding having to perform a full
collection scan (16-18). Documents with missing values in the field “user_location”
represent nearly 14% (1,392/10,000) of the “users” collection as discussed above.
The command to create this index is:
> db.users.createIndex({"user_location":"text"})
The word text is taken literal to mean any text string value (16). To search the text index of the
“users” collection, the $text and $search operators are used as follows (19):
This command shows there are no profiles in the index without a text string value in the field
“user_location” as denoted by “” “” (16-18).
7. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
7
3.3 Visualization
To visualize a subset of the reformatted geocoded profiles that reportedly originated in Indiana
the following command was used (20-22):
$mongoexport -d twitterdb -c users -q "{"geocode": {$exists:
true,$ne:null}, "geocode.formatted_address":{$regex:
"USA"},"geocode.formatted_address":{ $regex: "IN" }}" --csv --
fields geocode.formatted_address,user_name -o twitterout
The next step is to reformat the output file into the visualization format using the command (20):
$ awk '{ printf("[ %s ],n", $l);}' twitterout
To create a visualization html file of the Indiana tweets (68 total) the screen output list was
inserted into the sample html code on the webpage shown in reference (23). The Google Maps
API is used to visualize the data as shown below (3):
A copy of the validated data was dumped from the database for submission using the
mongodump tool (24):
$ mongodump –d twitterdb –c users
8. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
8
4 References
1) http://datamanagementcourse.soic.indiana.edu/
2) Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin Chen-Chuan Chang: Towards
social user profiling: unified and discriminative influence model for inferring home
locations. KDD 2012:1023-1031
3) https://developers.google.com/maps/documentation/geocoding/intro
4) https://developers.google.com/maps/
5) https://www.virtualbox.org/wiki/Downloads
6) https://developers.google.com/api-client-library/javascript/features/authentication
7) https://iu.instructure.com/courses/1491590/discussion_topics/6311828
8) http://docs.mongodb.org/manual/tutorial/modify-documents/
9) http://docs.mongodb.org/manual/faq/mongo/
10) http://jacobnibu.info/articles/Modeling%20Twitter%20Dataset.pdf
11) http://docs.mongodb.org/manual/tutorial/query-documents/
12) https://www.quora.com/What-is-%C3%9CT-19-137603-72-813111-in-Twitter
13) http://ubersocial.com/
14) http://docs.mongodb.org/manual/reference/operator/query/regex/
15) http://journals.uic.edu/ojs/index.php/fm/article/view/4366/3654
16) http://docs.mongodb.org/manual/core/index-text/
17) https://docs.mongodb.org/manual/core/crud-introduction/
18) http://docs.mongodb.org/manual/reference/operator/query/text/#op._S_text
19) http://docs.mongodb.org/manual/reference/operator/query/text/
20) README.txt file
21) https://docs.mongodb.org/manual/reference/program/mongoimport/
22) http://stackoverflow.com/questions/31514688/how-to-use-mongoimport-for-specific-
fileds-from-tsv-file/31528255#31528255
23) https://developers.google.com/chart/interactive/docs/gallery/map#fullhtml
9. I590 Management, Access, and Use of Big and Complex Data Jim Nelson
Lesson 9: Twitter Dataset Analysis and Modeling Project October 20, 2015
9
24) http://docs.mongodb.org/manual/reference/program/mongodump/