The document describes an analysis of Citi Bike trip data from New York City using Spark and Spark SQL. It loads the data from CSV files as a DataFrame and answers business questions about popular routes, longest trips, peak ride times, farthest distances, most popular stations, and busiest days. It also builds a logistic regression model to predict riders' gender based on trip characteristics, achieving an accuracy of [accuracy percentage].
Like the sale of goods, sometimes the share of ownership in a company must be discounted due to difficulty in finding a buyer. Liquidation costs of equity in private businesses may be substantial, and the equity’s value is discounted for that potential illiquidity. Likewise, partial ownership of a private firm may be worth less than proportional share of the total business. This webinar delves into these types of discounts and how they may impact the valuation of your asset.
Part of the webinar series: Valuation 2021
Our integration expert, Siva Subrahmanyam Chavali, explains why a majority of IoT projects are biting the dust and how you can enable a foolproof IoT ecosystem for your business.
For businesses that want to stay relevant in a Digitized Market, it's imperative to consider Digital Transformation. At Cygnet we deliver 100% Agile solutions in line with your business goals
Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.
Like the sale of goods, sometimes the share of ownership in a company must be discounted due to difficulty in finding a buyer. Liquidation costs of equity in private businesses may be substantial, and the equity’s value is discounted for that potential illiquidity. Likewise, partial ownership of a private firm may be worth less than proportional share of the total business. This webinar delves into these types of discounts and how they may impact the valuation of your asset.
Part of the webinar series: Valuation 2021
Our integration expert, Siva Subrahmanyam Chavali, explains why a majority of IoT projects are biting the dust and how you can enable a foolproof IoT ecosystem for your business.
For businesses that want to stay relevant in a Digitized Market, it's imperative to consider Digital Transformation. At Cygnet we deliver 100% Agile solutions in line with your business goals
Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.
A presentation explaining the concepts of public key infrastructure. It covers topics like Public Key Infrastructure (PKI) introduction, Digital Certificate, Trust Services, Digital Signature Certificate, TLS Certificate, Code Signing Certificate, Time Stamping, Email Encryption Certificate
Blockchain is distributed ledger that record the transactions between two parties known as blocks and are connected to each other using cryptography. Once recorded on blockchain, these transactions can't be altered or tampered with.
Startup Stage - Logistics - Presentation by David Nothacker, Co-Founder of Sennder at the NOAH Conference Berlin 2017, Tempodrom on the 23rd of June 2017.
Data as a Service (DaaS): The What, Why, How, Who, and WhenRocketSource
Data as a Service (DaaS) is one of the most ambiguous offerings in the "as a service" family. Yet, in today's world, data and analytics are key to building a competitive advantage. We're clearing up the confusion around DaaS and helping your company understand when and how to tap into this service.
Credit card plays a very vital role in todays economy and the usage of credit cards has dramatically increased. Credit card has become one of the most common method of payment for both online and offline as well as for regular purchases of a common man. It is very necessary to distinguish fraudulent credit card transactions by the credit card organizations so their clients are not charged for the purchases that they didn’t make. Despite the fact that using credit card gives huge benefits when used responsibly carefully and however significant credit and financial damages could be caused by fraudulent activities as well. Numerous methods have been proposed to stop these fraudulent activities. The project illustrates the model of a dataset to predict fraud transactions using machine learning. The model then detects if it is a fraudulent or a genuine transaction. The model also analyses and pre processes the dataset along with deployment of multiple anomaly detection using algorithms such as Local forest outlier and Isolation forest. Nikitha Pradeep | Dr. A Rengarajan "Credit Card Fraud Detection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41289.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/41289/credit-card-fraud-detection/nikitha-pradeep
Real time gross settlement systems (RTGS) are specialist funds transfer systems where transfer of money or securities takes place from one bank to another on a "real time" and on "gross" basis. It can be defined as the continuous (real-time) settlement of funds transfers individually on an order by order basis (without netting).
Cloud Computing In Banking And Finance IndustryTyrone Systems
Cloud computing allows organizations to get up and running on an outsourced IT infrastructure without the time or cost investment. It also allows financial firms to start modernizing their technology with minimal investments.
SITEC eCommerce Class 4
Title: Online Payment
Presenter: Chan Kok Long, Executive Director (iPay88)
Date: 11 August 2015
Venue: The Canvas @ Damansara Perdana
Peter Afanasiev - Architecture of online PaymentsCiklum Ukraine
Payment Service Providers
Merchant Payment Systems
General architecture of a Payment System
Know-hows:
Payment queues with MSSQL Broker
Adapter Polymorphism
Tracing in Service Oriented World
Dynamic configuration editor with ASP.Net MVC
Presenting this set of slides with name - Enterprise Blockchain Powerpoint Presentation Slides. Keep your audience glued to their seats with professionally designed PPT slides. This deck comprises of total of eighteen slides. It has PPT templates with creative visuals and well researched content. Not just this, our PowerPoint professionals have crafted this deck with appropriate diagrams, layouts, icons, graphs, charts and more. This content ready presentation deck is fully editable. Just click the DOWNLOAD button below. Change the colour, text and font size. You can also modify the content as per your need. Get access to this well crafted complete deck presentation and leave your audience stunned.
A hardware and software platform, which turns a smartphone into a powerful payment, loyalty and identification tool:
- All-in-one,
- Simple authentication & authorization,
- P2P transfers,
- Pay by QR code,
- Pay by NFC,
- Pay by cards linked to an account,
- Mobile acquiring,
- Invoices,
- Loans,
- E-policies,
- Consolidation of loyalty programs,
- Discounts and promotions,
- Ticketing.
White Label - under Your Brand in 2-3 months!
The digital transformation of CPG and manufacturingCloudera, Inc.
Both CPG (Consumer Packaged Goods) and manufacturing industries face similar challenges: in order to differentiate and innovate, they need to draw insight from the one thing they both have in abundance - data. The source of the data may be different; the opportunity is always innovation and differentiation. For CPG, forever changing buyer expectations must flow into product development and sales. For both CPG and manufacturing, Industry 4.0 promises improved efficiency, lower costs, and higher revenues. Becoming data-driven is the key for both industries and requires clever combination of machine learning, analytics and cloud for success.
In this webinar, business strategist Frank Vullers will discuss how Cloudera's platform is central to this century’s industrial revolution: the digital transformation of the CPG and manufacturing industries.
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.
A presentation explaining the concepts of public key infrastructure. It covers topics like Public Key Infrastructure (PKI) introduction, Digital Certificate, Trust Services, Digital Signature Certificate, TLS Certificate, Code Signing Certificate, Time Stamping, Email Encryption Certificate
Blockchain is distributed ledger that record the transactions between two parties known as blocks and are connected to each other using cryptography. Once recorded on blockchain, these transactions can't be altered or tampered with.
Startup Stage - Logistics - Presentation by David Nothacker, Co-Founder of Sennder at the NOAH Conference Berlin 2017, Tempodrom on the 23rd of June 2017.
Data as a Service (DaaS): The What, Why, How, Who, and WhenRocketSource
Data as a Service (DaaS) is one of the most ambiguous offerings in the "as a service" family. Yet, in today's world, data and analytics are key to building a competitive advantage. We're clearing up the confusion around DaaS and helping your company understand when and how to tap into this service.
Credit card plays a very vital role in todays economy and the usage of credit cards has dramatically increased. Credit card has become one of the most common method of payment for both online and offline as well as for regular purchases of a common man. It is very necessary to distinguish fraudulent credit card transactions by the credit card organizations so their clients are not charged for the purchases that they didn’t make. Despite the fact that using credit card gives huge benefits when used responsibly carefully and however significant credit and financial damages could be caused by fraudulent activities as well. Numerous methods have been proposed to stop these fraudulent activities. The project illustrates the model of a dataset to predict fraud transactions using machine learning. The model then detects if it is a fraudulent or a genuine transaction. The model also analyses and pre processes the dataset along with deployment of multiple anomaly detection using algorithms such as Local forest outlier and Isolation forest. Nikitha Pradeep | Dr. A Rengarajan "Credit Card Fraud Detection" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41289.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/41289/credit-card-fraud-detection/nikitha-pradeep
Real time gross settlement systems (RTGS) are specialist funds transfer systems where transfer of money or securities takes place from one bank to another on a "real time" and on "gross" basis. It can be defined as the continuous (real-time) settlement of funds transfers individually on an order by order basis (without netting).
Cloud Computing In Banking And Finance IndustryTyrone Systems
Cloud computing allows organizations to get up and running on an outsourced IT infrastructure without the time or cost investment. It also allows financial firms to start modernizing their technology with minimal investments.
SITEC eCommerce Class 4
Title: Online Payment
Presenter: Chan Kok Long, Executive Director (iPay88)
Date: 11 August 2015
Venue: The Canvas @ Damansara Perdana
Peter Afanasiev - Architecture of online PaymentsCiklum Ukraine
Payment Service Providers
Merchant Payment Systems
General architecture of a Payment System
Know-hows:
Payment queues with MSSQL Broker
Adapter Polymorphism
Tracing in Service Oriented World
Dynamic configuration editor with ASP.Net MVC
Presenting this set of slides with name - Enterprise Blockchain Powerpoint Presentation Slides. Keep your audience glued to their seats with professionally designed PPT slides. This deck comprises of total of eighteen slides. It has PPT templates with creative visuals and well researched content. Not just this, our PowerPoint professionals have crafted this deck with appropriate diagrams, layouts, icons, graphs, charts and more. This content ready presentation deck is fully editable. Just click the DOWNLOAD button below. Change the colour, text and font size. You can also modify the content as per your need. Get access to this well crafted complete deck presentation and leave your audience stunned.
A hardware and software platform, which turns a smartphone into a powerful payment, loyalty and identification tool:
- All-in-one,
- Simple authentication & authorization,
- P2P transfers,
- Pay by QR code,
- Pay by NFC,
- Pay by cards linked to an account,
- Mobile acquiring,
- Invoices,
- Loans,
- E-policies,
- Consolidation of loyalty programs,
- Discounts and promotions,
- Ticketing.
White Label - under Your Brand in 2-3 months!
The digital transformation of CPG and manufacturingCloudera, Inc.
Both CPG (Consumer Packaged Goods) and manufacturing industries face similar challenges: in order to differentiate and innovate, they need to draw insight from the one thing they both have in abundance - data. The source of the data may be different; the opportunity is always innovation and differentiation. For CPG, forever changing buyer expectations must flow into product development and sales. For both CPG and manufacturing, Industry 4.0 promises improved efficiency, lower costs, and higher revenues. Becoming data-driven is the key for both industries and requires clever combination of machine learning, analytics and cloud for success.
In this webinar, business strategist Frank Vullers will discuss how Cloudera's platform is central to this century’s industrial revolution: the digital transformation of the CPG and manufacturing industries.
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.
Deep Dive Into Swift - Presented at Coffee@DBG
In the previous month, Coffe@DBG Covered the basics of Swift, Apple's new programming language. This session introduces some of handpicked interesting features of Swift.
Vehicle routing and scheduling Models:
Travelling salesman problem
vehicle routing problem with time window
Pick up and delivery problem with time window
Lunchlezing landelijke keuzemodellen voor OctaviusLuuk Brederode
lunch lecture at Goudappel company on results of the estimation of (dutch) nation wide discrete choice models on OViN data on mode and destination choices for the demand model Octavius
James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. Project Information
Domain: Travel
Technology use: Spark, Spark SQL, Spark MLlib
Dataset: https://s3.amazonaws.com/tripdata/index.html
Citi Bike is the nation's largest bike share program, with 10,000 bikes and 600
stations across Manhattan, Brooklyn, Queens and Jersey City. Analyze this data to
find the interesting patterns in the traffic.
3. Data Description
Trip Duration (seconds)
Start Time and Date
Stop Time and Date
Start Station Name
End Station Name
Station ID
Station Lat/Long
Bike ID
User Type (Customer = 24-hour pass or 3-day pass user; Subscriber = Annual
Member)
Gender (Zero=unknown; 1=male; 2=female)
Year of Birth
4. Business Questions
Spark Core / Spark SQL
Which route Citi Bikers ride the most?
Find the biggest trip and its duration?
When do they ride?
How far do they go?
Which stations are most popular?
Which days of the week are most rides taken on?
Spark MLLIB
Predicting the gender based on patterns in the trip
5. Data Loading
Used com.databricks.spark library to load CSV data.
val bike = sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").option("delimiter",
",").load("/user/cloudera/Project/CitiBikeNYC/* - Citi Bike trip data.csv")
Parsed to rename the columns and register the Dataframe as a Temporary Table.
val bikeDF = bike.toDF("trip_duration", "start_time", "stop_time", "start_station_id",
"start_station_name", "start_station_latitude", "start_station_longitude",
"end_station_id", "end_station_name", "end_station_latitude",
"end_station_longitude", "bike_id", "user_type", "birth_year", "gender")
bikeDF.registerTempTable("bike")
6. Business Questions Answered
Q1. Which route Citi Bikers ride the most?
To determine the route Citi Bikers ride the most.
Grouped the Start and End station combination to find the aggregate count and fetching
the route with highest rides taken on.
sqlContext.sql("SELECT start_station_id, start_station_name, end_station_id,
end_station_name, COUNT(1) AS cnt FROM bike GROUP BY start_station_id,
start_station_name, end_station_id, end_station_name ORDER BY cnt DESC LIMIT 1")
Q2. Find the biggest trip and its duration?
To Determine the biggest trip, analyzed trip duration and distance of each trip.
Distance of each trip calculated based on distance between station location (Latitude and
Longitude).
Defined an UDF function calcDist to calculate the distance between locations.
7. Business Questions Answered
def calcDist(lat1: Double, lon1: Double,lat2: Double, lon2: Double): Int = {
val AVERAGE_RADIUS_OF_EARTH_KM = 6371
val latDistance = Math.toRadians(lat1 - lat2)
val lngDistance = Math.toRadians(lon1 - lon2)
val sinLat = Math.sin(latDistance / 2)
val sinLng = Math.sin(lngDistance / 2)
val a = sinLat * sinLat + (Math.cos(Math.toRadians(lat1)) *
Math.cos(Math.toRadians(lat2)) * sinLng * sinLng)
val c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a))
(AVERAGE_RADIUS_OF_EARTH_KM * c).toInt }
sqlContext.udf.register("calcDist", calcDist _)
Ordered by the trip duration and distance to find the biggest trip.
sqlContext.sql("SELECT start_station_id, start_station_name, end_station_id,
end_station_name, trip_duration, calcDist(start_station_latitude, start_station_longitude,
end_station_latitude, end_station_longitude) AS dist FROM bike ORDER BY trip_duration
DESC, dist DESC LIMIT 1")
8. Business Questions Answered
Q3. When do they ride?
To determine the top 10 most preferred time for rides.
Grouped the Start time based on Hours and Minutes to aggregate the number of rides
taken at respective time and fetching the top 10 most preferred Start time by riders.
sqlContext.sql("SELECT date_format(start_time,'H:m'), COUNT(1) as cnt FROM bike
GROUP BY date_format(start_time,'H:m') ORDER BY cnt DESC LIMIT 10")
Q4. How far do they go?
To determine the farthest ride ever taken.
Calculated the distance between Start and End stations based on the UDF function
calcDist created and fetched the ride with longest distance.
sqlContext.sql("SELECT start_station_id, start_station_name, end_station_id,
end_station_name, trip_duration, calcDist(start_station_latitude, start_station_longitude,
end_station_latitude, end_station_longitude) AS dist FROM bike ORDER BY dist DESC
LIMIT 1")
9. Business Questions Answered
Q5. Which stations are most popular?
To determine the top 10 most popular stations among the riders.
To get the count aggregations of each station, used union to get the overall station list
irrespective of Start or End station and fetch the top 10 stations.
var popularStationsDF = sqlContext.sql("SELECT start_station_id as station_id,
start_station_name as station_name, COUNT(1) as cnt FROM bike GROUP BY start_station_id,
start_station_name UNION ALL SELECT end_station_id as station_id, end_station_name as
station_name, COUNT(1) as cnt FROM bike GROUP BY end_station_id, end_station_name")
popularStationsDF.registerTempTable("popularStations")
sqlContext.sql("SELECT station_id as station_id,station_name, SUM(cnt) as total FROM
popularStations GROUP BY station_id,station_name ORDER BY total DESC LIMIT 10")
Q6. Which days of the week are most rides taken on?
To determine the day of the week when most of the rides are taken.
Grouping each day of week to aggregate the total rides and fetch the day with highest rides
taken on.
sqlContext.sql("SELECT date_format(start_time,'E') AS Day, COUNT(1) AS cnt FROM bike
GROUP BY date_format(start_time,'E') ORDER BY cnt DESC LIMIT 1")
11. Q7. Predicting the gender based on
patterns in the trip
Building a model using Logistic Regression to predict the gender based on patterns in the trip.
Converting the bikeDF Dataframe to RDD so as to apply the data to model.
val bikeRDD = bikeDF.rdd
To define the Label and Features, extracted the features based on ride patterns.
val features = bikeRDD.map {
bike => val gender = if (bike.getInt(14) == 1) 0.0 else if (bike.getInt(14) == 2) 1.0 else 2.0
val trip_duration = bike.getInt(0).toDouble
val start_time = bike.getTimestamp(1).getTime.toDouble
val stop_time = bike.getTimestamp(2).getTime.toDouble
val start_station_id = bike.getInt(3).toDouble
val start_station_latitude = bike.getDouble(5)
val start_station_longitude = bike.getDouble(6)
val end_station_id = bike.getInt(7).toDouble
val end_station_latitude = bike.getDouble(9)
val end_station_longitude = bike.getDouble(10)
val user_type = if (bike.getString(12) == "Customer") 1.0 else 2.0
Array(gender, trip_duration, start_time, stop_time, start_station_id, start_station_latitude, start_station_longitude,
end_station_id, end_station_latitude, end_station_longitude, user_type) }
12. Q7. Predicting the gender based on
patterns in the trip
Gender is used as label and other relevant columns are used as features to form
the Labeled Vector.
val labeled = features.map { x => LabeledPoint(x(0), Vectors.dense(x(1), x(2), x(3), x(4),
x(5), x(6), x(7), x(8), x(9), x(10)))}
Splitting the data for Tests and Training respectively.
val training = labeled.filter(_.label != 2).randomSplit(Array(0.60, 0.40))(1) val test
= labeled.filter(_.label != 2).randomSplit(Array(0.40, 0.60))(1)
Running the training algorithm to build the predicting model. Cached training set
is passed to the training algorithm.
Number of classes is set to 2.
val model = new LogisticRegressionWithLBFGS().setNumClasses(2).run(training)
13. Q7. Predicting the gender based on
patterns in the trip
Computing the predictions on the Testing data.
val predictionAndLabels = test.map {
case LabeledPoint(label, features) =>
val prediction = model.predict(features)
(prediction, label) }
To determine the accuracy of the built model.
Fetching the count of Wrong predictions found based on label and prediction
values.
With the help of total test count and wrong counts accuracy is determined.
val wrong = predictionAndLabels.filter {
case (label, prediction) => label != prediction }
val wrong_count = wrong.count
val accuracy = 1 - (wrong_count.toDouble / test_count)
15. Conclusion
Citi Bike NYC data loaded and analysed successfully.
CSV data loaded as Dataframe to answer the business questions using Spark SQL.
Built an Mllib Logistic Regression model to predict the gender of each rider based
on patterns in the trip.
Code: https://github.com/ssushmanth/CitiBikeNYC-spark