Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - e-DNA BotStop (e-DNA BotStop)

Teaser

In the last decade there has been an explosion in Online Travel Agents (OTAs) worldwide. OTAs undertake the mammoth task of undercutting the flight prices of major airlines through the use of Bots (an internet Bot, also known as web robot, WWW robot or simply bot, is a...

Summary

In the last decade there has been an explosion in Online Travel Agents (OTAs) worldwide. OTAs undertake the mammoth task of undercutting the flight prices of major airlines through the use of Bots (an internet Bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet.). Bots are used to scrape airlines for valuable data to benchmark aggregate flight costs, which drives down prices for the consumer.

Whilst beneficial to consumers, scraping harms travel companies because:
• Bots can engage with a websites’ server hardware and cause website traffic to run slower, in some cases causing server downtime and Direct Denial of Service (DDoS)
• Long term Search Engine Optimization (SEO) damage; distorting analytical marketing metrics.
• Diverting customers to purchase products via third party resellers, limiting chances for up-sell and cross sell opportunities.

This problem is tackled by anti-scrape approaches. However, current anti-scrape/booking bot solutions are only capable of distinguishing between human traffic and bot traffic through supervised algorithms that do not work to the degree of efficacy required.

our overall objective is to commercialise BotStop, an algorithmic approach to identifying Bots and scrapers and to policing malicious application traffic. eDNA will provide a solution which reintroduces transparency into the process of purchasing flights and will streamline customer website experience to ensure a more stress-free experience.

eDNA will provide a solution which reintroduces transparency into the process of purchasing flights and will streamline customer website experience to ensure a more stress-free experience.The solution will allow airlines to fill more flights. This will allow airlines to meet commercial goals without the need to host extra flights, thus reducing carbon emissions.

Work performed

After undertaking relevant activities, we have developed an in-depth understanding of the likely criteria demanded by carriers.

For Traditional/legacy carriers Bot attacks are less frequent than on low cost carrier sites due to the availability of inventory via the world’s GDS systems. The role of EDNA Botstop in the traditional/legacy carrier scenario would be to use its inherent tracking features to trace bookings back to inventory source thus revealing relationships/tie-ups etc along the way. Currently, there is no automated solution in this area which E-DNA can fulfil.

Low cost carriers sell primarily to the public via their websites with limited participation in GDS. Here, E-DNA Botstop and scrape mitigation tools are badly needed to prevent increased IT costs and loss of control of product as middlemen mark-up fares, bag and seat prices. The combination of standard E-DNA Botstop and scrape mitigation together with primary tracing to source inventory means this segment would be the most receptive to EDNA solutions.

Dialogue with numerous airlines has taken place to understand their infrastructure and to provide a plug and play solution that has the ability to be implemented at ease without the need for specialist development. eDNA has engaged numerous Airlines and Technical leads as part of this research to develop architecture that is versatile and has the ability to accommodate numerous technology stacks. We recognised through this study that the opportunities are for both Traditional/legacy carriers and LCCs. However, the biggest opportunity is with the low-cost carriers such as Ryanair, EasyJet and Jet2 due to the increased IT costs and loss of control of product as middlemen mark-up fares, bag and seat prices. Our solution will give this sector the biggest wins.

In planning the further development programme for BotStop, we gained a clear understanding of the work required to develop BotStop to TRL9. We have built a work plan that is manageable, affordable and realistic to achieve.

Two patents have been identified both filed in the US and have not yet been granted. These patents use very different approaches and do not cause us problems with freedom to operate. We are not infringing third party IP with our innovation.

eDNA will therefore proceed with our plans to reach market as soon as possible, in order to capitalise on timing of key market drivers and establish BotStop as the technology of choice for airlines, and to bring our business plan to fruition

Final results

The current crop of anti-scrape/booking bot solutions are only capable of identifying what is human traffic and what is bot traffic through supervised algorithms that do not work to the degree of efficacy required.

eDNA uses the Hidden Markov Model represented in a Bayesian Network to discover detectable patterns based on query intervals. eDNA has the following algorithm in place (see fig 3), if multiple request patterns are detected then access based on this model is on batch arrivals. Calculation of both ‘Qt’ and ‘qt’ from historic information flows, whereby ‘qt’ is the number of requests within a given time unit (30 seconds) and ‘Qt’ provides the foundation of the intervals. Because query intervals between human and Bots will likely vary there will be patterns emerging, and bursts can be reflected by ‘qt’ and ‘Qt’. With many applications being horizontal in nature, bot queries will obey identical guidelines as humanistic behaviour.

Using the eDNA API front end and back end server connections, a real-time request for each user’s credibility is made. Each request is given a risk score based on our machine learning techniques mentioned above. The score is returned within 700ms. If the score returned is 1 the fingerprint is blacklisted, with 0% of false positives. Where a score less than 1 is returned, the entity making the request can either be challenged, blocked or be fed bad data. Where a score of 0 is returned the visitor is human so no threat and as such will be able to access Websites, Apps & APIs without being challenged, this is also applicable to whitelisted rules.

Website & more info

More info: https://www.e-dna.co/airlines/.