Lesson 2 of 8

What is OSINT? The primary sources of OSINT intelligence. The concept of restricted and open data.

5 minute read

Open source intelligence, or, OSINT, came into its own in the early 2000s as more people engaged online and the term digital identity came into existence. With the rise of “putting yourself out there”, concerns around privacy and data retention came to the forefront. With the right accesses and database logins, there is a whole host of information available online about individuals. Registered to vote? Your information is on the electoral role. Had credit? Your information is available for credit referencing. The list goes on. With over 1.5bn websites online as of 2024, a large portion of information is freely available to the average computer user.

Although the term Open Source is used frequently, it should really be divided into three sub categories. Information that is readily accessible without a secondary act; Information that is accessible but requires a log in or password; and Information which is accessed using specialist software and is ‘hidden’ from the rest of the internet. These are referred to respectively as the open web, the deep web and the dark web.

Search Engines

Search engines routinely scour the internet (known as crawling), adding to their index so that they can return better results to the user. Just because a search returns no results, it doesn’t mean the term isn’t on the internet – the search engine you used may not have ‘crawled’ that page yet. Always try a different search engine. Boolean search operators separate normal internet users, with those trained in OSINT. This course will demonstrate some of those advanced operators, allowing you to dive deeper into the online world and retrieve more accurate and useful information.

Social Media

If Facebook were a country, it would have a population larger than China and India combined. Social media presence is almost universal. New niche sites pop up often, with newcomers such as tiktok. Using good OSINT skills, the collector can build up a suitable profile of an individual based on their profiles online. Imagine your Facebook page – what information is stored on there about you? A Facebook page holds a great deal of information about an individual. Their date of birth, friends, family and relative names, locations frequented, photos of the individual, their house and vehicles.

All of this can be gathered to be used to build up a profile. Some social media sites require a log on to access this information. If you are researching for work purposes, never use your personal account. Using a false persona protects the collector from attribution to their real world identity. Some social media sites, such as LinkedIn, will tell a user that another user has visited their page. This leaves a footprint that you do not want to be traced back to you.

When selecting a false persona, avoid a celebrity name that will make it apparent it is false, or anything connected to your real world identity such as a mother’s maiden name. Using a disposable email address, such as maildrop, will enable you to set up a social media account with low attribution.

Intelligence Capture

Intelligence, or more accurately, information, once collected, should be recorded so that it can be analysed further at a later date. A good way of doing this is to save the webpage as a tamper proof PDF, or, using a piece of software to take an image of the site. FireShot is a great example of this type of tool.

Intelligence from Imagery Photos (and audio) contain information from the device that captured them, and this is known as exchangeable image file format or, exif data. This information can include the device that took the image, geo-location data, user details and any other information that is layered behind the file. If you upload a photo into exif viewer, it will return the data that sits behind it, however this technique is reliant on the data being saved by the device or not being “stripped” when the image is uploaded (for example, websites such as Facebook strip out this data.)

Reverse image searching allows the collector to upload an image to see where it appears online. Websites such as Tineye or Yandex will also match similar images by tone, colour and vectors within the image. This is really useful for companies carrying out due diligence to see if images have been used elsewhere.

Website and IP address Intelligence

When a website is created, certain details need to be captured by the registrar, including who it is registered to and some of their contact details. Recently, the information has been cut back as to what is listed online publicly. However, you can still see if a website has changed its name, when it was registered and where it is based. This can be really useful for private sector companies for example, looking to establish if another company is legitimate, or, has only just created an online presence. Websites such as central ops.net are a good example of this, and use the Wayback Machine to see how a website has changed overtime. IP geolocation can aid an intelligence analyst in identifying the true location of an online identity. There are countless tools available on the internet if you search for IP geolocation.

Type into your browser “what’s my IP address”, copy the result, and then search for it on a geolocation tool. Websites capture visitors IP details, which is why good digital hygiene is important to limit the footprint you leave.

This course will give you a solid foundation in the basics of open source intelligence, Google searching, social media and intelligence that can be gained from imagery.

What is OSINT? The primary sources of OSINT intelligence. The concept of restricted and open data.

Get Full Access 🔑

Log in