I plan to write a series on Web Application Hacking. Starting from scratch I look forward to cover most topics. I skipped the recon section, this assumes that you have selected a website to hack in.
Lets get this started with the Mapping the Application.
You have to know more about the application you are targeting. The mantra to find bugs is Enumeration…. Enumeration…. Enumeration…. The more you know how an application works the easier things get.
Manually go through the application to gain a very basic understanding of what the application is built for. For example, you may see google drive, after browsing through it, we will know that it is an application to store data.
You also have two important files to look after.
robots.txt is a file that contains paths that a site creator doesn’t want the robots / crawlers like google to index in a search. This may have admin pages, or even secret directories known only to the organization.
sitemap.xml may or may not contain important information, but it proves to be useful, it is what the name says. A file containing different URLs the site has.
Now we move on to automation, use tools to find more things as we cannot manually browse through the whole application.
Spidering means visiting each and every link that a HTML file has in it. It is of two types active and passive.
Active Spidering visits each link and submits any random data if required and adds the URLs to our sitemap.
Passive Spidering doesn’t visit links or submit data, it just parses the HTML file and adds the URLs it finds to our sitemap.
Spidering can be done using a variety of tools like Burp Suite, OWASP ZAP, Web Scarab etc. Most the tools have a default function to automatically do passive spidering.
This is the spidering window in Burp, you can get to by Right Clicking the Target ⇒ Scan ⇒ Open Scan Launcher. Here you can configure your spider and start it by clicking OK.
Note that Spider has some advantages and disadvantages.
It may automate most of your work. It is very good at exploring REST-Style URLs. It is bad when same page displays different content depending on params. It may click on a Logout link, terminating the session.
User Directed Spidering
This is a highly useful technique to manage the disadvantages of the Automated Spider.
You have to turn off Intercept in your proxy and browse through the entire application that you can see in your eyes. That means, click on every links, fill up every forms, complete all stages in a multi stage process etc.
Look at our sitemap, if there are any links not visited manually, do it manually.
“Yes, this may be a pain, but it will really pay off.”
Now with this lot of information about the application, you can further expand this by doing active spidering on each link in the sitemap. If no additional content is found, Congrats you did a great job.
Now we move on to the carelessness of the developers or those who deploy the application on the server.
They may have left any important files like logs, database files in the Web Directory or even there may be pages that are present but unlinked in the main website.
There are some approaches, to discover the hidden content.
First make some requests with valid and invalid content, to know how the application handles the 404.
Some applications may return 200 status code and display a custom 404 message.
Now use tools like wfuzz, dirbuster, dirb, gobuster, ffuf etc to make large number of requests to identify various directories and subdirectories.
Try variety of Wordlists, you may look at SecLists and most welcomed to use other lists from other sources.
Try varying extensions too like .log, .db, .md, .bak etc which may represent other files present on the webserver.
Now that you have performed initial Brute force, it would have identified some amount of information. You are left with your knowledge now. Make guesses. For example, if there is a page called /user/CreateFile, /user/ViewFile there may be /user/EditFile or /user/DeleteFile. Look for these type of information. Make as much as guesses you could make depending on the naming scheme used by the application.
3.Public Information (OSINT)
The application may not have content that are not present now, but had it in the past.
To find those, we may make use of the powerful Search Engines and the Wayback machine. Use Dorks to find data about the target.
Some sample dorks, that are highly useful
Make use of the
cache: also, as it returns the last cached copy of the page.
Use tools Like ParamSpider to scour through the archive.org for interesting parameters.
If you are successful in finding any old content, it may be present in the live application. It may have certain vulnerabilities which may be the cause to hide / remove it from the application.
During the Recon stage you would have identified most of the information about the servers.
The Server software itself may contain bugs, like it may allow to read files on the server. Even some servers may ship with default content, like default credentials or some default pages, which may allow us to infer more information about the application also.
Automated scanners are highly helpful in these situations.
Use Nikto or Wikto, they have a large database of default content. They are also prone to false positives, so manually verify things.
“Nikto is from a Russian word meaning Nobody. As in, who is scanning our systems? Nobody :)”
If you use IP address for the scan, while parsing the links Nikto may think that these links are for a different domain and it gets ignored. So use the tools wisely.
Pages Vs Functions
Some applications would have functional pages. The same page provides different outputs. The output depends on the parameters of the request.
If this is carried out using a POST request, this may be avoided by a spider. So we have to identify those specific URLs and modify the requests to see if there are any changes / deviations.
Draw a mindmap of what are functions of the application, so you could isolate a specific function and devote special attention to attack it.
Its time for some more guesses !
Try submitting parameters like debug=True, test, hide, source to some functional pages an monitor the response. If there is an anamoly, probe further vary the value like 0, 1, False etc.
The ideal place to try this is where you identify the application to be complex, that is where the place needed more debugging.
So it might have been implemented there. Think more like a developer here!
For those who are in a hurry or donot like reading long text,
- Manually browse the application to understand what the application does in a basic level.
- Look for robots.txt and sitemap.xml for information.
- Spider the website using any of the automated tools. Careful with the configuration.
- With the intercept turned off in your proxy, manually browse through the entire application submit any forms, complete multistage logins etc.
- Use directory bruteforcing tools to find directories and subdirectories. Try different lists and extensions.
- Use dorks to identify public information. Use ParamSpider or archive.org to find content that was in the past.
- Run a Nikto / Wikto Scan to identify default content on the server or the vulnerabilities of the server software.
- Create a mindmap of the application’s functions to isolate a function to attack.
- Identify a complex function in the application and submit parameters like debug=True, test=0, etc. to identify hidden parameters.
- Repeat all these until no new content is found.
Thats it for now, We have mapped the application. I recommend performing these recursively, until nothing new is found out.
Lets meet at my next blog of analyzing the application.
Hope you learnt something and Thanks for reading.